```
from pyspark.sql import SparkSession, DataFrame, Column
import pyspark.sql.types as sqlt
import pyspark.sql.functions as sqlf
```

The pygame.time.Clock object has many methods and let’s say we want to document and constrain that we only use the `tick`

method. If we keep the signature the same as the implementation, there is no need to cast and type checking will make sure we use no other methods.

```
class SimClock(Protocol):
def tick(self, framerate: float) -> int:
"""
Wait for a tick to pass as defined by framerate. This is to keep the simulation running at a consistent rate.
"""
...
# Other code ...
clock: SimClock = SimClock, pygame.time.Clock()
while True:
# Processing code...
clock.tick(fps) # this is good
# clock.get_time() <- This will fail type checking
```

If the method signature is not the same, then we must cast. The example below changes `tick`

to accept an integer instead of a float and not caring about the return. Thus, being more restrictive. I feel casting is a typing code smell, so I stick with the signatures in the implementation. But, this can not be always avoided. As always there are no hard and fast rules.

```
class SimClock(Protocol):
def tick(self, framerate: int) -> None:
"""
Wait for a tick to pass as defined by framerate. This is to keep the simulation running at a consistent rate.
"""
...
# Other code ...
clock: SimClock = cast(SimClock, pygame.time.Clock()) # CAST!
while True:
# Processing code...
clock.tick(fps) # same as before
```

Why go through the trouble? We are tring to increase readability and help us focus on what the code is actually implementing. Reducing the surface area of an external object, helps with all of that. It means that someone new to the code doesn’t have to read through all of the methods in the library to know exactly which ones will be used in a glance. You can even add comments if none are available in the library to further reduce cognitive load.

One of the features that I love about PySpark is the data frame abstraction is agnostic about data sources and destinations. This allows unit testing data transformations without connecting to a database or file system. As a bonus data frames are immutable and each transformation returns a new one. This means we can test transformations on the data in bite-sized chunks.

All we need to do is create an in-memory data frame, perform a set of transformations, then check the data by executing an action. As an added bonus, the transformations can be tested without data. Given an input schema, the resulting output schema can be confirmed alone. No data needed. This allows us to make sure that our PySpark script fulfills its contract. The unit tests can verify preconditions and postconditions by comparing schemas.

Let’s start with a simple example. Our dataset is a simple list of musicians with their bands and roles. It’s just enough to keep explanations clear.

```
from pyspark.sql import SparkSession, DataFrame, Column
import pyspark.sql.types as sqlt
import pyspark.sql.functions as sqlf
```

`spark = SparkSession.builder.getOrCreate();`

```
data = [
("Rob", "Halford", "Judas Priest", "Singer"),
("Alice", "Cooper", "Hollywood Vampires", "Singer"),
("Steve", "Harris", "Iron Maiden", "Bassist"),
("James", "Hetfield", "Metallica", "Singer"),
("Bernie", "Worrell", "Parliament", "Keyboardist"),
]
schema = sqlt.StructType(
[
sqlt.StructField("first_name", sqlt.StringType(), True),
sqlt.StructField("last_name", sqlt.StringType(), True),
sqlt.StructField("band", sqlt.StringType(), True),
sqlt.StructField("role", sqlt.StringType(), True),
]
)
df = spark.createDataFrame(data=data, schema=schema)
df.printSchema()
df.show(truncate=False)
```

```
root
|-- first_name: string (nullable = true)
|-- last_name: string (nullable = true)
|-- band: string (nullable = true)
|-- role: string (nullable = true)
+----------+---------+------------------+-----------+
|first_name|last_name|band |role |
+----------+---------+------------------+-----------+
|Rob |Halford |Judas Priest |Singer |
|Alice |Cooper |Hollywood Vampires|Singer |
|Steve |Harris |Iron Maiden |Bassist |
|James |Hetfield |Metallica |Singer |
|Bernie |Worrell |Parliament |Keyboardist|
+----------+---------+------------------+-----------+
```

Let’s create some transformation functions that we can use to test our data.

```
def drop_unnecessary(df: DataFrame) -> DataFrame:
"""
Removed the role column
"""
return df.drop("role")
def only_singers(df: DataFrame) -> DataFrame:
"""
Filter out only the singers
"""
return df.where(sqlf.col("role") == "Singer")
def combine_names(first: Column, last: Column) -> Column:
"""
Take the first name and last name columns and create a single name structure
"""
return sqlf.struct(first.alias("first"), last.alias("last"))
def fix_name(df: DataFrame) -> DataFrame:
"""
Fix names from two columns to one
"""
return df.select(
"band",
combine_names(sqlf.col("first_name"), sqlf.col("last_name")).alias("name"),
)
```

Let’s run the transformations that we just implemented. The “transform” method on DataFrame allows us to call the transformations in sequence without having to create temporary variables. It also makes the code easier to read with less noise.

```
"""
Since data frames are lazy, we can filter out the singers after dropping the role column. Notice that
the code below runs and returns the correct answer.
"""
new_df = df.transform(drop_unnecessary).transform(only_singers).transform(fix_name)
new_df.show()
```

```
+------------------+-----------------+
| band| name|
+------------------+-----------------+
| Judas Priest| {Rob, Halford}|
|Hollywood Vampires| {Alice, Cooper}|
| Metallica|{James, Hetfield}|
+------------------+-----------------+
```

Now, let’s write some tests that verify the output schema for each transformation and that each transformation changes the input data as expected.

```
import pytest
import ipytest
ipytest.autoconfig()
ipytest.clean()
@pytest.fixture
def spark() -> SparkSession:
return SparkSession.builder.getOrCreate()
@pytest.fixture
def schema() -> sqlt.StructType:
"""
Create the input schema to be used for tests
"""
return sqlt.StructType(
[
sqlt.StructField("first_name", sqlt.StringType(), True),
sqlt.StructField("last_name", sqlt.StringType(), True),
sqlt.StructField("band", sqlt.StringType(), True),
sqlt.StructField("role", sqlt.StringType(), True),
]
)
@pytest.fixture
def data(spark: SparkSession, schema: sqlt.StructType) -> DataFrame:
"""
Sample input data to use in tests
"""
data = [
("Rob", "Halford", "Judas Priest", "Singer"),
("Alice", "Cooper", "Hollywood Vampires", "Singer"),
("Steve", "Harris", "Iron Maiden", "Bassist"),
("James", "Hetfield", "Metallica", "Singer"),
("Bernie", "Worrell", "Parliament", "Keyboardist"),
]
return spark.createDataFrame(data=data, schema=schema)
# Schema tests
def test_drop_unnecessary_schema(spark: SparkSession, schema: sqlt.StructType):
"""
This test verifies only the column names
"""
empty = spark.createDataFrame(data=[], schema=schema)
result = empty.transform(drop_unnecessary)
assert ["first_name", "last_name", "band"] == result.columns
def test_only_singers_schema(spark: SparkSession, schema: sqlt.StructType):
"""
You can also verify the contract by comparing schema instances.
In this case, the schemas should be the same.
"""
empty = spark.createDataFrame(data=[], schema=schema)
result = empty.transform(only_singers)
assert result.schema == schema
def test_fix_name_schema(spark: SparkSession, schema: sqlt.StructType):
"""
Schema can also be verified with a simple string
"""
empty = spark.createDataFrame(data=[], schema=schema)
result = empty.transform(fix_name)
assert ["band", "name"] == result.columns
assert (
result.schema["name"].simpleString() == "name:struct<first:string,last:string>"
)
# Data test
def test_end_to_end(spark: SparkSession, data: DataFrame):
result = (
data.transform(drop_unnecessary).transform(only_singers).transform(fix_name)
)
# call an action
output = result.collect()
assert len(output) == 3
assert output[0]["band"] == "Judas Priest"
assert output[0]["name"]["first"] == "Rob"
assert output[0]["name"]["last"] == "Halford"
assert output[1]["band"] == "Hollywood Vampires"
assert output[2]["band"] == "Metallica"
ipytest.run();
```

```
.... [100%]
4 passed in 0.20s
```

I hope this gave you some ideas on how to test the transformations in your own pipelines. It’s helped me simplify my tests and verify edge cases more easily. It’s caught many errors sooner without doing a full run of the script or using all of the data. It also makes debugging quicker since the tests are smaller.

I’ve always loved the Towers of Hanoi ever since I was introduced to it as a kid by chance. I thought it would be perfect to write a solver using Julia to learn it. This is my first project using Julia and I found it enjoyable. I’m still learning the ropes so to speak. For future projects, I would like to stick to a more strict functional style of programming for the solvers and use the naming convention of using “!” postfix to denote stateful functions.

Julia initially picqued my interest with its take on types, multiple dispatch, and no objects. It forces you away from generic types which can make for more readable code. The typing system gets you thinking about the problem differently and how you can use types to better document intent without relying on problematic “if” statements. The packaging system, REPL, and jupyter integration make developing and trying out packages easy. There is a learning curve and some frustrations with how code is loaded that gets in the way of development initially. But, nothing a Google search can’t fix. All part of learning something new and a different take on programming.

The lack of encapsulation is worrisome for large projects. I have the same issue with Python, but documentation and a few team guidelines help. It’s always good to have strict enforcement though. Minimizing mutable state structures and documenting the functions that mutate to use goes a long way. All in all a minor quibble.

Overall, I enjoyed coding in Julia and plan to implement more projects with it. The type system is the intriguing piece and I did some experiements in the code to try to push my understanding. Look at the implementation of moves in the domain for an example. This was a quick weekend project and comments are minimal. There are several emergent design patterns in Julia that I would love to explore as well like the Holy Trait Pattern and work more through the excellent “Hands on Design Patterns and Best Practices with Julia” book.

If you would like to follow along, I have provided the source for the domain along with the tests. I implemented two solvers and provided the source for the naive version and the A* with heurstic version.

```
import Pkg
Pkg.activate(".")
```

` Activating project at `~/Documents/julia/TowerOfHanoi``

A sample run with 4 discs is shown below. Keep scrolling for the A* version.

`include("src/naive_solver.jl")`

Moves to solve: 79 *** ***** ******* ********* heurstic value: 16 Move from 1 to 3 ***** ******* ********* *** heurstic value: 14 Move from 3 to 2 ***** ******* ********* *** heurstic value: 14 Move from 1 to 3 ******* ********* *** ***** heurstic value: 14 Move from 2 to 3 ******* *** ********* ***** heurstic value: 16 Move from 3 to 1 *** ******* ********* ***** heurstic value: 15 Move from 3 to 2 *** ******* ********* ***** heurstic value: 15 Move from 1 to 3 ******* ********* ***** *** heurstic value: 14 Move from 3 to 2 ******* *** ********* ***** heurstic value: 16 Move from 1 to 3 *** ********* ***** ******* heurstic value: 18 Move from 2 to 3 *** ********* ***** ******* heurstic value: 18 Move from 3 to 1 *** ********* ***** ******* heurstic value: 16 Move from 2 to 3 *** ***** ********* ******* heurstic value: 18 Move from 1 to 3 *** ***** ********* ******* heurstic value: 22 Move from 3 to 2 ***** ********* *** ******* heurstic value: 18 Move from 3 to 1 ***** ********* *** ******* heurstic value: 15 Move from 2 to 3 ***** *** ********* ******* heurstic value: 17 Move from 3 to 1 *** ***** ********* ******* heurstic value: 16 Move from 3 to 2 *** ***** ********* ******* heurstic value: 16 Move from 1 to 3 ***** ********* ******* *** heurstic value: 15 Move from 3 to 2 ***** *** ********* ******* heurstic value: 17 Move from 1 to 3 *** ********* ******* ***** heurstic value: 18 Move from 2 to 3 *** ********* ******* ***** heurstic value: 18 Move from 3 to 1 *** ********* ******* ***** heurstic value: 16 Move from 3 to 2 *** ***** ********* ******* heurstic value: 18 Move from 1 to 3 ***** ********* ******* *** heurstic value: 18 Move from 3 to 2 *** ***** ********* ******* heurstic value: 22 Move from 1 to 3 *** ***** ******* ********* heurstic value: 26 Move from 2 to 3 ***** *** ******* ********* heurstic value: 24 Move from 3 to 1 ***** *** ******* ********* heurstic value: 21 Move from 2 to 3 ***** *** ******* ********* heurstic value: 21 Move from 1 to 3 *** ***** ******* ********* heurstic value: 26 Move from 3 to 2 *** ***** ******* ********* heurstic value: 24 Move from 3 to 1 *** ***** ******* ********* heurstic value: 20 Move from 2 to 3 *** ***** ******* ********* heurstic value: 20 Move from 3 to 1 *** ***** ******* ********* heurstic value: 18 Move from 2 to 3 *** ******* ***** ********* heurstic value: 20 Move from 1 to 3 *** ******* ***** ********* heurstic value: 24 Move from 3 to 2 ******* ***** *** ********* heurstic value: 20 Move from 1 to 3 ***** ******* *** ********* heurstic value: 26 Move from 2 to 3 *** ***** ******* ********* heurstic value: 32 Move from 3 to 1 ***** ******* *** ********* heurstic value: 25 Move from 3 to 2 ******* *** ***** ********* heurstic value: 21 Move from 1 to 3 *** ******* ***** ********* heurstic value: 26 Move from 3 to 2 *** ******* ***** ********* heurstic value: 24 Move from 3 to 1 *** ******* ***** ********* heurstic value: 19 Move from 2 to 3 *** ******* ***** ********* heurstic value: 19 Move from 3 to 1 *** ******* ***** ********* heurstic value: 17 Move from 2 to 3 *** ***** ******* ********* heurstic value: 19 Move from 1 to 3 *** ***** ******* ********* heurstic value: 23 Move from 3 to 2 ***** ******* *** ********* heurstic value: 19 Move from 3 to 1 ***** ******* *** ********* heurstic value: 16 Move from 2 to 3 ***** *** ******* ********* heurstic value: 18 Move from 3 to 1 *** ***** ******* ********* heurstic value: 17 Move from 3 to 2 *** ***** ******* ********* heurstic value: 17 Move from 1 to 3 ***** ******* ********* *** heurstic value: 16 Move from 3 to 2 ***** *** ******* ********* heurstic value: 18 Move from 1 to 3 *** ******* ********* ***** heurstic value: 19 Move from 2 to 3 *** ******* ********* ***** heurstic value: 19 Move from 3 to 1 *** ******* ********* ***** heurstic value: 17 Move from 3 to 2 *** ***** ******* ********* heurstic value: 19 Move from 1 to 3 ***** ******* ********* *** heurstic value: 19 Move from 3 to 2 *** ***** ******* ********* heurstic value: 23 Move from 1 to 3 *** ***** ********* ******* heurstic value: 26 Move from 2 to 3 ***** *** ********* ******* heurstic value: 24 Move from 3 to 1 ***** *** ********* ******* heurstic value: 21 Move from 2 to 3 ***** *** ********* ******* heurstic value: 21 Move from 1 to 3 *** ***** ********* ******* heurstic value: 26 Move from 3 to 2 *** ***** ********* ******* heurstic value: 24 Move from 3 to 1 *** ***** ********* ******* heurstic value: 20 Move from 2 to 3 *** ***** ********* ******* heurstic value: 20 Move from 3 to 1 *** ***** ********* ******* heurstic value: 18 Move from 3 to 2 *** ******* ***** ********* heurstic value: 20 Move from 1 to 3 ******* ***** ********* *** heurstic value: 20 Move from 3 to 2 *** ******* ***** ********* heurstic value: 24 Move from 1 to 3 *** ******* ********* ***** heurstic value: 26 Move from 2 to 3 ******* *** ********* ***** heurstic value: 24 Move from 3 to 1 ******* *** ********* ***** heurstic value: 21 Move from 3 to 2 ***** ******* *** ********* heurstic value: 25 Move from 1 to 2 final: *** ***** ******* *********

I implemented the A* algorigithm and used the following heurstic function:

```
function heuristic(initial::Tower, state::Tower)
disc_cache = Dict()
for (r, rod) in enumerate(initial.rods)
for (d, disc) in enumerate(rod.discs)
disc_cache[disc]=(r,d)
end
end
num_discs = length(disc_cache)
result = 0
for (r, rod) in enumerate(state.rods)
for (d, disc) in enumerate(rod.discs)
(ir, id) = disc_cache[disc]
rod_diff = abs(ir - r) > 0 ? 2 : 1
disc_diff = num_discs - abs(id - d)
result += rod_diff * disc_diff
end
end
result
end
```

It simply sums how many discs are not in the goal state from the given intial. Thus, a higher score is closer to the goal and the A* algorithm maximizes search for it.

A sample run with 4 discs is shown below.

The naive solver took 79 moves and the A* solver took 21 moves. A huge improvement in finding the solution. This was a great project to learn the basics of Julia.

`include("src/heuristic_solver.jl")`

Number of moves: 21 *** ***** ******* ********* heurstic value: 16 Move from 1 to 2 ***** ******* ********* *** heurstic value: 14 Move from 2 to 3 ***** ******* ********* *** heurstic value: 14 Move from 1 to 2 ******* ********* ***** *** heurstic value: 14 Move from 3 to 2 ******* *** ********* ***** heurstic value: 16 Move from 1 to 3 *** ********* ***** ******* heurstic value: 18 Move from 2 to 3 *** ********* ***** ******* heurstic value: 18 Move from 3 to 1 *** ********* ***** ******* heurstic value: 16 Move from 2 to 3 *** ***** ********* ******* heurstic value: 18 Move from 1 to 3 *** ***** ********* ******* heurstic value: 22 Move from 1 to 2 *** ***** ********* ******* heurstic value: 26 Move from 3 to 2 *** ***** ********* ******* heurstic value: 24 Move from 3 to 1 *** ***** ********* ******* heurstic value: 20 Move from 2 to 3 *** ***** ********* ******* heurstic value: 20 Move from 3 to 1 *** ***** ********* ******* heurstic value: 18 Move from 3 to 2 *** ******* ***** ********* heurstic value: 20 Move from 1 to 2 *** ******* ***** ********* heurstic value: 24 Move from 1 to 3 *** ******* ********* ***** heurstic value: 26 Move from 2 to 3 ******* *** ********* ***** heurstic value: 24 Move from 3 to 1 ******* *** ********* ***** heurstic value: 21 Move from 3 to 2 ***** ******* *** ********* heurstic value: 25 Move from 1 to 2 *** ***** ******* *********

I love number puzzles and bought one inspired by Einstein on holiday recently. The idea is simple. There’s 16 tiles with numbers that can be placed in a 4x4 grid. The goal is for each row, column, and diagonal to sum up to 264. The tiles can be flipped so if tile has “66” imprinted on it, it can also represent “99”.

Here’s an example layout of tiles.

```
---------------------
| 18 | 89 | 98 | 61 |
-----+----+----+-----
| 68 | 91 | 88 | 16 |
-----+----+----+-----
| 81 | 19 | 66 | 99 |
-----+----+----+-----
| 96 | 69 | 11 | 86 |
---------------------
```

Things quickly escalated when I got the idea to write a solver that would use genetic programming concepts. The algorithm is simple. Start with a population of all scrambled states, then pick the best layouts out of the population. Exchange a few tiles in each of samples from the the best to create a new population along with a few new scrambled states. Repeat until we get a layout of tiles that satisfies the goal.

First things first. A domain to model the puzzle was needed.

```
import itertools
import random
from typing import Iterable, Sequence
# Type to represent the layout of tiles
State = list[int]
# Map the values of a number when flipped
FLIP_MAP: dict[int, int] = {
1: 1,
6: 9,
8: 8,
9: 6,
}
# All the tiles represented
ALL_TILES: State = [
66,
66,
19,
19,
86,
86,
89,
89,
16,
16,
18,
18,
88,
96,
69,
11,
]
# The goal that each row, column, and diagonal should sum up to
GOAL = 264
def flip(tile: int, flip_map: dict[int, int] = FLIP_MAP) -> int:
"""
Take a tile and flip it. So, 89 will become 68.
Calling it twice should return the original tile. 89->68->89
"""
assert tile > 9 and tile < 100
ones = tile % 10
tens = tile // 10
return flip_map[ones] * 10 + flip_map[tens]
def add_rows(tiles: State) -> list[int]:
"""
Return the sum of all rows
"""
return [sum(tiles[0:4]), sum(tiles[4:8]), sum(tiles[8:12]), sum(tiles[12:16])]
def add_columns(tiles: State) -> list[int]:
"""
Return the sum of all columns
"""
return [
sum(each for i, each in enumerate(tiles) if i % 4 == 0),
sum(each for i, each in enumerate(tiles) if i % 4 == 1),
sum(each for i, each in enumerate(tiles) if i % 4 == 2),
sum(each for i, each in enumerate(tiles) if i % 4 == 3),
]
def add_diagonals(tiles: State) -> list[int]:
"""
Return the sum of the two diagonals
"""
return [
tiles[0] + tiles[5] + tiles[10] + tiles[15],
tiles[12] + tiles[9] + tiles[6] + tiles[3],
]
def score_iter(tiles: State) -> Iterable[int]:
"""
Return a score for each row, column, and diagonal.
The score is the distance away from the goal.
"""
return (
abs(each - GOAL)
for each in itertools.chain(
add_rows(tiles), add_columns(tiles), add_diagonals(tiles)
)
)
def score(tiles: State) -> float:
"Return a sum of how far away from goal for each row, column, and diagonal. This score is divided by the goal."
return sum(each / float(GOAL) for each in score_iter(tiles))
def score_breakdown(tiles: State) -> list[float]:
"Return a list of all the scores"
return list(score_iter(tiles))
def chance(probability: float) -> bool:
"Return True with probability between 1 and 0"
return random.random() < probability
def exchange(
tiles: State,
probability_to_flip: float = 0.5,
probability_to_exchange: float = 0.95,
) -> State:
"Return a new state of tiles after exchanging two and possibly flipping either"
choices = random.sample(range(len(tiles)), k=2)
new_tiles = tiles.copy()
for index in choices:
if chance(probability_to_flip):
new_tiles[index] = flip(new_tiles[index])
if chance(probability_to_exchange):
new_tiles[choices[0]], new_tiles[choices[1]] = (
new_tiles[choices[1]],
new_tiles[choices[0]],
)
return new_tiles
def scramble(tiles: State, probability_to_flip: float = 0.5) -> State:
"Scramble all of the tiles"
result = random.sample(tiles, k=len(tiles))
return [flip(each) if chance(probability_to_flip) else each for each in result]
```

To make sure the domain is correct, some tests were needed.

```
import ipytest
ipytest.autoconfig()
ipytest.clean()
import math
def test_flip():
def assert_flip(input: int, expected: int):
assert flip(input) == expected
assert flip(expected) == input
assert_flip(66, 99)
assert_flip(19, 61)
assert_flip(86, 98)
assert_flip(89, 68)
assert_flip(16, 91)
assert_flip(18, 81)
assert_flip(88, 88)
assert_flip(96, 96)
assert_flip(69, 69)
assert_flip(11, 11)
def test_add_rows():
tiles = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
actual = add_rows(tiles)
assert actual == [10, 26, 42, 58]
def test_add_columns():
tiles = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
actual = add_columns(tiles)
assert actual == [28, 32, 36, 40]
def test_add_diagonals():
tiles = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
actual = add_diagonals(tiles)
assert actual == [1 + 6 + 11 + 16, 4 + 7 + 10 + 13]
def test_score():
actual = score(ALL_TILES)
assert math.isclose(actual, 2.708, abs_tol=0.001)
def test_solution():
actual = score([91, 89, 68, 16, 18, 66, 81, 99, 86, 98, 19, 61, 69, 11, 96, 88])
assert math.isclose(actual, 0.0, abs_tol=0.001)
def test_solution_another():
actual = score([18, 86, 69, 91, 99, 61, 88, 16, 81, 19, 96, 68, 66, 98, 11, 89])
assert math.isclose(actual, 0.0, abs_tol=0.001)
def test_score_breakdown():
actual = score_breakdown(ALL_TILES)
assert actual == [94, 86, 196, 0, 8, 0, 69, 127, 83, 52]
def test_sum():
"Example to show sum of four tiles"
assert 264 == sum([88, 96, 69, 11])
ipytest.run();
```

```
......... [100%]
9 passed in 0.01s
```

The implementation creates a generation of tile layouts for each iteration until a layout is found that sums up to 264 across each row, column, and diagonal. There are few hyper parameters that were tweaked by hand while researching. The most important provided are what percentage of the top population to keep for mutations and how many new layouts to add. Finally, probabilities can be changed for the chances of flipping a tile in a mutation and if the tiles will be exchanged. Using 10% of the top tile layouts seemed to work the best, while generating a new population that is 90% of the original size by sampling with replacement mutating each. Finally, adding new scrambled layouts for the remaining 10%. Repeat until a solution is found.

It’s not perfect and it can get stuck in local minima. But, it is able find a solution in most cases in under 100,000 generations. Further research is needed for improvement.

```
import random
import math
from dataclasses import dataclass
from typing import Callable, Optional
def initial_pop(tiles: State = ALL_TILES, size: int = 1000) -> list[list[int]]:
"Create a population of size of scrambled tiles"
return [scramble(tiles) for _ in range(size)]
def sample_pop_best(pop: list[State], best: list[State]) -> list[State]:
"""
Sample with replacement to create a new population of the best and add a population of new fully scrambled tiles.
"""
return [
exchange(each) for each in random.choices(best, k=len(pop) - len(best))
] + initial_pop(size=len(best))
@dataclass(frozen=True)
class Progress:
"""
Simple holder for progress
"""
generation: int
mean_score: float
best_score: float
best_thus_far: State
def run(
keep_percent: float = 0.1,
progress_update: Callable[[Progress], None] = lambda progress: None,
max_generations: int = 100_000,
) -> Optional[State]:
"""
Simple solution using genetic programming concepts.
"""
pop = initial_pop()
generation = 1
to_keep = int(len(pop) * keep_percent)
while generation <= max_generations:
scores = list(map(score, pop))
top_best = sorted(zip(scores, pop), key=lambda each: each[0])[:to_keep]
best_score = top_best[0][0]
best = top_best[0][1]
mean_score = sum(each[0] for each in top_best) / len(top_best)
progress = Progress(generation, mean_score, mean_score, best)
progress_update(progress)
if math.isclose(best_score, 0.0):
solution_index = scores.index(best_score)
return pop[solution_index]
pop = sample_pop_best(pop, [each[1] for each in top_best])
generation += 1
return None
@dataclass
class UpdatePrinter:
"""
Simple object to be used for printing periodic progress
"""
generation: int = 0
how_often: int = 100
def __call__(self, progress: Progress) -> None:
self.generation = progress.generation
if progress.generation % self.how_often == 0:
print(
"gen:",
progress.generation,
"mean:",
round(progress.mean_score, 4),
"best:",
round(progress.best_score, 4),
progress.best_thus_far,
)
```

```
%%time
seed = 1690419104
update = UpdatePrinter()
random.seed(seed)
solution = run(progress_update=update)
print("generations: ", update.generation)
print("solution: ", solution)
```

```
gen: 100 mean: 0.1423 best: 0.1423 [18, 89, 98, 61, 68, 91, 88, 16, 81, 19, 66, 99, 96, 69, 11, 86]
gen: 200 mean: 0.1404 best: 0.1404 [18, 91, 86, 69, 61, 88, 96, 18, 98, 11, 66, 89, 89, 66, 19, 91]
gen: 300 mean: 0.1322 best: 0.1322 [18, 89, 96, 61, 66, 88, 91, 19, 91, 18, 66, 86, 89, 69, 11, 98]
gen: 400 mean: 0.1441 best: 0.1441 [19, 89, 91, 66, 66, 88, 98, 11, 91, 18, 61, 89, 86, 69, 18, 96]
generations: 423
solution: [18, 99, 86, 61, 66, 81, 98, 19, 91, 16, 69, 88, 89, 68, 11, 96]
CPU times: user 3.03 s, sys: 11.1 ms, total: 3.04 s
Wall time: 3.04 s
```

```
ipytest.clean()
import pytest
import math
import random
SOLUTIONS = (
(1669425992, [16, 98, 61, 89, 69, 81, 18, 96, 88, 66, 99, 11, 91, 19, 86, 68]),
(1669426298, [68, 16, 81, 99, 89, 91, 66, 18, 96, 88, 19, 61, 11, 69, 98, 86]),
(1690419104, [18, 99, 86, 61, 66, 81, 98, 19, 91, 16, 69, 88, 89, 68, 11, 96]),
)
@pytest.mark.parametrize("seed, expected", SOLUTIONS)
def test_solution(seed: int, expected: State):
random.seed(seed)
actual = run()
assert actual == expected
assert math.isclose(score(actual), 0.0, abs_tol=0.001)
ipytest.run();
```

```
... [100%]
3 passed in 109.71s (0:01:49)
```

If you are looking for my older posts, please visit here.

]]>