Back in 2000 I went back to university. A friend of mine who was just about to start his PhD at Sussex Uni told me about a master’s degree called Evolutionary and Adaptive Systems. The courses sounded like all the bits of my degree that I’d enjoyed the most (evolution, simulated robotics, neural networks) and none of the bits I hadn’t (symbolic AI, planning, expert systems).

During a lecture on simulated evolution, we were introduced to a paper by W Daniel Hillis excitingly entitled “Coevolving Parasites improve Simulated Evolution as an Optimization Procedure”.

### Sorting networks

The crux of this paper was to use evolution (specifically co-evolution) to optimise designs of sorting network. These being an ordered list of comparison/swaps that will take an unordered, fixed-length list of values and sort it for you (a more complete description can be found here). For instance the sorting network to sort a list of integers that is three long could be:

- Compare the second and third, swapping if the second is bigger than the third.
- Compare the first and third, swapping if the first is bigger than the third.
- Compare the first and second, swapping if the first is bigger than the second.

You could express this as (2,3) (1,3) (1,2)

### Big numbers

Whilst the sorting network to sort a list of length n can be easily worked out for very small values of n, the number of options increases exponentially as n increases. This happens to such an extent that only the solutions for lists up to length 10 are known to be optimal (9 and 10 only being proven optimal in 2014). To order a list of length 11, there is a best known network containing 35 comparison/swaps. There is also some clever methods to prove that it can’t be smaller than 33, but to prove that no 33 length network could order a list, you would have to search exhaustively through all possible networks of length 33. This is quite a few networks. Specifically (ignoring comparisons of the same value against itself e.g. (2,2), flipped comparisons ( (1,2) and (2,1) having the same effect) and two identical comparisons one after the other, you’d have to search through this many permutations:

( 11 * 10 / 2 ) * ( ( (11 * 10 / 2 ) – 1 ) ^ 32 )

This works out to 1.5 * 10 ^57 or roughly 1,500,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 different networks.

This is somewhat impractical.

Even to test if a network could order a list of length n is computationally expensive. Fortunately, thanks to the zero-one principle, you only have to check against 2^n rather than n! unsorted lists to show whether a network is successful or not. For our example of sorting lists of length 11, this is the difference between testing against 2048 unsorted lists and 39,916,800.

### Evolution lends a hand

W Daniel Hillis’ first grand idea was to evolve the sorting networks. Start with a population of randomised networks. See which networks did the best job of sorting the problem unordered lists. Breed from these champions. Rinse and repeat.

Hillis’ second grand idea was to co-evolve these sorting networks against a population of unsorters. Instead of testing the sorters against all 2^n possible unsorted lists, they would be tested against the best unsorted lists the unsorter population could provide. The sorting networks being judged on how well they sorted those lists, the unsorting population being judged on how often they bamboozled the sorters.

There were two reasons for this. The first was that the sorters could be tested on far less than 2^n unsorted lists, which would speed evolution along significantly. The second was to avoid local optima.

### The problem of local optima

Local optima are points where your evolving population hit a peak and then point-blank refuse to get better. They’ve improved to a level where all nearby mutations only lead to reduced fitness, so they’ve nowhere else to go. This is fine if you happen to have reached the best possible solution in the search space or the solution you’ve hit is good enough for your needs, but otherwise you need a way to shake things up and find either a better local optimum or the global optima. One common way of doing this is by your selection method for new parents to breed from. If you only choose the best individual in your population and breed from them (called elitism) then it’s very easy to get stuck in local optima. If you choose your parents via some stochastic method (a semi-random method that has a probability distribution, but cannot be predicted precisely) then this can help push you out of local optima. In genetic algorithms these include selection methods such as roulette wheel selection and tournament selection.

### Co-evolution to the rescue

By using co-evolution, Hillis prevented local optima from getting his sorting population stuck. As they didn’t test against all possible unsorted lists but only against the ones the unsorting population produced, individuals that were less fit when compared against sorting all possible lists could end up being more fit against the lists produced by the unsorting population. This would add a degree of noise to the fitness evaluation and let the sorters wander out of local optima. This could also mean that the population would wander away from a better solution if the unsorting networks weren’t fit enough to pressure the sorting networks adequately. Testing against the 2^n unsorted lists could cause the sorting population into a local optima. Testing against a weak population of unsorters could cause the sorting population not to get to any local optima (and hence not produce fully functional minimal sorting networks) in the first place.

In the end, Hillis ended up producing novel sorting networks that were at or around the best produced of the day. More importantly he showed how much co-evolution sped the search for candidate solutions and produced better results than single population evolution, due to avoiding local optima.

After attending this lecture, myself and my friend Andy then spent the rest of the day programming our own versions of this experiment and leaving them for days running on the university mainframe (mostly ending in the processes being killed by the system administrators after three or four days, even after changing the names of the processes to PleaseDoNotKill.exe). I remember getting close to some of the best networks at the time but never reaching or exceeding them. Danny Hillis admittedly did have a Connection Machine with 64,536 processors to play with, even if it was ten years before we had a go.

…and all this is why I chose Minimal Sorting Networks as the first problem domain to write my framework around.