As I reported in a post last year (2014), even though the fifth “Scorpion Cipher” (i.e. ‘S5’) sent to John Walsh is arranged using a 12-column layout, it has a very strong internal 16-column structure. What this means is that every single shape repeat spans a distance that is a multiple of 16: which in turn suggests that the encipherer formed the S5 ciphertext by rigidly cycling through a set of 16 simple substitution cipher alphabets.
If you therefore rearrange S5’s shapes into a 16-column layout and colourize their repeats, you get something like the following (click on it to see a higher resolution version):
Now, 155 out of S5’s 180 characters are unique, giving it a ‘multiplicity’ (155/180) of 86%, which is way too high to be cracked using a conventional homophonic cipher solver. For comparison, the three Beale Ciphers have multiplicities of 57%, 24%, and 43% respectively, while the (solved) Zodiac Z408’s multiplicity is a paltry 13%. In fact, the upper limit on solvability for homophonic ciphertexts seems to be multiplicities of around 20%-25% if you’re lucky (or 10%-15% if you’re not), so S5 would at first sight seem to be waaaaaay out of anybody’s practical range.
But I’m not so sure.
Going through what has been released of the encipherer’s letters that the ciphertexts accompanies, he/she starts by saying:-
This code took a lot of time and effort to develop, in hopes that it will defeat FBI and CIA codebreakers.
Which is ‘kind of reasonable’, though the whole enciphering activity would seem to be somewhat pointless unless the person’s overall aim was to somehow emulate the original Zodiac Killer’s ciphers. In a later letter, the encipherer’s position gets finessed somewhat:
I now realise with many hundeds of hours of […] mindracking experimentation with my complex ciphers that my first one that I sent you [S1] was comparatively simple to my second [S2], third [S3], fourth [S4], and now temporarily final cryptograph system [S5]. I have been encoding useful information for your use and have done it fairly, since all of my ciphers can be decoded simply, once the limited patterns and systems are discovered.
What we learn from this, I think, is that what we are looking at here is not the product of a psychopathic academic cryptographer, but is rather a homebrewed cipher system, based around “limited patterns and systems”. So, a bright kid; probably good at maths; and has perhaps read enough popular cryptography (through and beyond the newspaper accounts of the Zodiac Killer’s ciphers) to avoid clunkingly obvious mistakes.
But the mentions of “patterns” makes me suspect that there’s also a little bit of the vanity of the pure mathematician there, intellectual pride that all it would take to “defeat FBI and CIA codebreakers” was “limited patterns and systems”. Hence I think we are likely to be looking at something that is innately very ordered, something that we’ll all kick ourself for not seeing when it is shown to us in the fullness of time. “What a clever person the Scorpion Cipher maker was“, we’re all supposed to say (according to that fantasy script), “much better at making ciphers than the Zodiac Killer ever was“.
In the case of S5, though, I suspect we now know just about enough to break it, even with its dauntingly high multiplicity.
My first observation is that even though it uses a large number of different shapes, these are drawn from a very much smaller set of shape families: and there may well be some kind of cryptographic relationship between the members of each family to help us:-
My second observation is that, with the exception of columns 10 and 11 (which may well be random, or possibly ‘S’ vs ‘T’ in the plaintext), the most frequent symbol in any column is always from a different family from the most frequent shape in any other column. It’s not the strongest of observations, sure, but it’s what leads me to my (grandly titled) S5 Construction Hypothesis.
My S5 Construction Hypothesis
I believe that the encipherer very probably constructed 16 cipher alphabets on gridded paper, within a 26 x 16 or perhaps a 16 x 26 grid. But this is a boring activity, and the encipherer’s text suggests a kind of proto-mathematical desire for elegance, like a smart 12-year-old who has just ‘got’ the whole idea of mathematics. So I hypothesize that the encipherer filled this rectangular grid with families of shapes along downward diagonals, from top-left to bottom-right.
Hence for the sixteen component alphabets, any genuine (as opposed to accidental) family of shapes would step through the alphabets. Here, a family that had a member enciphering A in alphabet #1 would also have a member enciphering B in alphabet #2, and maybe a member enciphering C in alphabet #3 etc.
This suggests a quite different kind of cryptologic solving logic from normal, one that not only offers us mathematical means to reduce the multiplicity (because we can posit connections between letters in diffent columns, giving us fewer degrees of freedom to steer our way through), but also spatial means to do the same thing.
What I mean by ‘spatial’ here is that if we look at, say, the family of shapes formed of squares with dots in, I think we might be able to assume that not only are these all part of the same family, but also all the missing shapes on columns without a similar family member can be excluded from the search.
That is, if alphabet #1 uses a square with dots in to encipher ‘A’ and alphabet #3 uses a different square with dots in to encipher ‘C’, then we can very probably infer that alphabet #2 uses a square with dots in to encipher ‘B’, even though we cannot actually see it in the ciphertext. Hence this kind of ‘holistic exclusion’ offers a spatial way to help us reduce the search space.
Of course, turning this visuo-spatial hypothesis into an effective computer algorithm will doubtless prove quite tricky. But perhaps it offers a way of making S5’s cryptologic challenge more tractable than it would be if were a pure homophonic cipher with such a scarily high multiplicity.