As I reported in a post last year (2014), even though the fifth “Scorpion Cipher” (i.e. ‘S5’) sent to John Walsh is arranged using a 12-column layout, it has a very strong internal 16-column structure. What this means is that every single shape repeat spans a distance that is a multiple of 16: which in turn suggests that the encipherer formed the S5 ciphertext by rigidly cycling through a set of 16 simple substitution cipher alphabets.

If you therefore rearrange S5’s shapes into a 16-column layout and colourize their repeats, you get something like the following (*click on it to see a higher resolution version*):

Now, 155 out of S5’s 180 characters are unique, giving it a ‘multiplicity’ (155/180) of 86%, which is way too high to be cracked using a conventional homophonic cipher solver. For comparison, the three Beale Ciphers have multiplicities of 57%, 24%, and 43% respectively, while the (solved) Zodiac Z408’s multiplicity is a paltry 13%. In fact, the upper limit on solvability for homophonic ciphertexts seems to be multiplicities of around 20%-25% if you’re lucky (or 10%-15% if you’re not), so S5 would at first sight seem to be waaaaaay out of anybody’s practical range.

But I’m not so sure.

Going through what has been released of the encipherer’s letters that the ciphertexts accompanies, he/she starts by saying:-

This code took a lot of time and effort to develop, in hopes that it will defeat FBI and CIA codebreakers.

Which is ‘kind of reasonable’, though the whole enciphering activity would seem to be somewhat pointless unless the person’s overall aim was to somehow emulate the original Zodiac Killer’s ciphers. In a later letter, the encipherer’s position gets finessed somewhat:

I now realise with many hundeds of hours of […] mindracking experimentation with my complex ciphers that my first one that I sent you [S1] was comparatively simple to my second [S2], third [S3], fourth [S4], and now temporarily final cryptograph system [S5]. I have been encoding useful information for your use and have done it fairly, since all of my ciphers can be decoded simply, once the limited patterns and systems are discovered.

What we learn from this, I think, is that what we are looking at here is not the product of a psychopathic academic cryptographer, but is rather a homebrewed cipher system, based around “limited patterns and systems”. So, a bright kid; probably good at maths; and has perhaps read enough popular cryptography (through and beyond the newspaper accounts of the Zodiac Killer’s ciphers) to avoid clunkingly obvious mistakes.

But the mentions of “patterns” makes me suspect that there’s also a little bit of the vanity of the pure mathematician there, intellectual pride that all it would take to “defeat FBI and CIA codebreakers” was “limited patterns and systems”. Hence I think we are likely to be looking at something that is innately very ordered, something that we’ll all kick ourself for not seeing when it is shown to us in the fullness of time. “*What a clever person the Scorpion Cipher maker was*“, we’re all supposed to say (according to that fantasy script), “*much better at making ciphers than the Zodiac Killer ever was*“.

In the case of S5, though, I suspect we now know just about enough to break it, even with its dauntingly high multiplicity.

My first observation is that even though it uses a large number of different shapes, these are drawn from a very much smaller set of shape families: and there may well be some kind of cryptographic relationship between the members of each family to help us:-

My second observation is that, with the exception of columns 10 and 11 (which may well be random, or possibly ‘S’ vs ‘T’ in the plaintext), the most frequent symbol in any column is always from a different family from the most frequent shape in any other column. It’s not the strongest of observations, sure, but it’s what leads me to my (grandly titled) S5 Construction Hypothesis.

## My S5 Construction Hypothesis

I believe that the encipherer very probably constructed 16 cipher alphabets on gridded paper, within a 26 x 16 or perhaps a 16 x 26 grid. But this is a boring activity, and the encipherer’s text suggests a kind of proto-mathematical desire for elegance, like a smart 12-year-old who has just ‘got’ the whole idea of mathematics. So I hypothesize that the encipherer filled this rectangular grid with **families of shapes along downward diagonals**, from top-left to bottom-right.

Hence for the sixteen component alphabets, any genuine (as opposed to accidental) family of shapes would step through the alphabets. Here, a family that had a member enciphering A in alphabet #1 would also have a member enciphering B in alphabet #2, and maybe a member enciphering C in alphabet #3 etc.

This suggests a quite different kind of cryptologic solving logic from normal, one that not only offers us **mathematical** means to reduce the multiplicity (*because we can posit connections between letters in diffent columns, giving us fewer degrees of freedom to steer our way through*), but also **spatial** means to do the same thing.

What I mean by ‘spatial’ here is that if we look at, say, the family of shapes formed of squares with dots in, I think we might be able to assume that not only are these all part of the same family, but also all the missing shapes on columns without a similar family member can be excluded from the search.

That is, if alphabet #1 uses a square with dots in to encipher ‘A’ and alphabet #3 uses a different square with dots in to encipher ‘C’, then we can very probably infer that alphabet #2 uses a square with dots in to encipher ‘B’, even though we cannot actually see it in the ciphertext. Hence this kind of ‘holistic exclusion’ offers a spatial way to help us reduce the search space.

Of course, turning this visuo-spatial hypothesis into an effective computer algorithm will doubtless prove quite tricky. But perhaps it offers a way of making S5’s cryptologic challenge more tractable than it would be if were a pure homophonic cipher with such a scarily high multiplicity.

A ‘smart twelve-year-old’ who’s been following in Dad’s footsteps? Good for him! Keep on keeping on, Nick!

bdid1dr (who is willing to bet that the twelve year old figured out my signature long ago.

:-^

Not to mention the consistency of not closing my parentheses!

bd

If you’re looking for a compendium of alphabets, how about Edmund Fry’s Panographia?

Title in full:

Pantographia; containing accurate copies of all the known alphabets in the world; together with an English explanation of the peculiar force or power of each letter (1799).

(I don’t really think it will help with the Scorpion cipher – but it is a lovely bit of work, and one of the few on this subject before 1900.)

Jibes well with his willingness to scratch out an entire line in S5 over a single misplaced dot, when neither D nor dotted-D had appeared previously. It would have been so much simpler to just switch their meaning on his key, if it was a simple list of homophones written beneath the alphabet. But if they were part of a logical progression such as you are proposing, then switching them messes up the ‘final draft’ of the cipher masterpiece he has been working on.

Always though it was strange that he was such a perfectionist to redraw that an entire line over one misplaced dot, yet not enough of one to redraw the entire thing.

I have posted before that I think the scratched out line is because the 8th symbol is drawn wrong, and it matches the 10th symbol. When the author was drawing the 10th symbol it became obvious that the symbol was a repeat and thus the error was apparent. This fits with symbols only reappearing every 16 places.

Nick, I think you might be wrong about the families being placed diagonally along the translation table. It was my guess too, it’s a simple way of making a decent cipher out of a bunch of symbol families.

The reason is if you look at your first image, the rearranged 16-column cipher, multiple symbols from the same family exist in the same column.

In column 1, the flag and music notes appear.

In column 4, a triangle with prominent vertices appears in both blank and corner line form.

In column 11, an E and an M that looks like a sideways E appear.

In column 15, two squares with black diagonal corners appear.

In column 16, two squares with different combinations of filled quadrants appear.

Of course I’m probably wrong about a few of these, but it makes me think that the pattern is something slightly different. I think you’re right that this cipher can be cracked by reverse engineering.

There is another line of attack, which is tracking down ciphers 2-4. What was the original source for 1 and 5? Is there a reason 2-4 are unavailable?

cf

madmen of that sort are not focussed on the object itself, but on the way it creates a power dynamic (as they imagine) between themselves and others. The aim in crossing out that line – I would put a dollar on it – is to ensure that he is heard clearly, plainly and correctly. Not to create an elegant product for its own sake, as we might do.

ponky: don’t forget that this is probably a homebrew cipher, so the encipherer is probably just plonking down symbols in diagonal lines to amuse and entertain himself/herself – which would leave plenty of space at the bottom-left and top-right corners for filler characters and short runs.

My understanding is that the FBI has the other ciphers and letters: but my best guess is that it is unlikely to release them any time soon. Which would be a shame, but there you go.

cf: obviously he/she is some kind of impatient perfectionist. 😉

Nick: If they put families in diagonal lines, we wouldn’t see families in the same columns of the 16-column rearrangement. But we do, so it’s not diagonal lines. It’s probably something similar though.

ponky: I think the “dice” variant shape is a pretty good indication that there are strong families of shapes in play here. I also don’t think it’s the whole story, sure, but it might be a good enough probabilistic starting point to reduce S5’s multiplicity dramatically. 🙂

Oh, I agree there are families, and it seems like most (all?) of the families have 16 members, as there are plenty of similar symbols with 4-bit properties. It’s also clear that the symbols can only repeat in cycles of 16. I don’t know what the relation between symbol families and repeat cycles is, but it seems like too much of a coincidence for them both to have a count of 16.

It’s maybe worth noting again that if you extend the families to include all logical symbols, you get collisions. E.g. it looks like around four families with have a symbol which is just a completely filled circle.

Could there be only 26 families? Maybe you could re-group them? Could it be a vigenere cipher with key length 16 ( 16 sets of homophones ) ? You are saying that it could be solved IF there is a constant distance between the families, or some type of pattern of family arrangements on the key?

GeoffLaT: there certainly seem to be many symmetries between groups of shapes, which probably indicate that some groups only appear in certain alphabets. But… as to which groups are groups and which groups go into which alphabets, that’s another pair of questions we haven’t definitively answered yet. 🙁

Nick: I will think about it and maybe mess around with it soon. Thanks a lot for finding the period 16n unigram repeats.

GeoffLaT: it was commenter Teddy who noted it on the OPORD forum in 2007, I merely passed it on in 2014. 🙂

I tried a simple experiment to find out if I could replicate the observations with the proposed cipher. I used a vigenere table 26 x 16, each cell with a unique symbol. Then cycled through the 16 rows as I encoded 100 messages with plaintext samples of the English language taken from 100 different sources and 180 characters long. Then drafted the message into 16 columns and colored the cells if there were repeats in the columns and counted them.

The lowest count of colored cells was 66, the mean 86 and highest count 108. The standard deviation was 8.8.

In the picture above I counted 63 colored symbols. That is more than two standard deviations away from the experiment mean, so the hypothesis seems unlikely but I guess not impossible. I tried to use a keyword length of 8, and of course there were a few repeats by row instead of by column. I wonder if the vigenere table is 36 x 16, allowing for 26 letters and digits 0-9, and with numbers in the plaintext, or if maybe the cipher is some other variation of vigenere with keyword length 16.

I decided to try another experiment. Use the same 26 x 16 table of unique symbols, cycle through the rows, and pick symbol from the columns at random. 100 trials.

The lowest count of colored cells was 46, the mean 60 and the highest count 79. The standard deviation was 6.8.

The scorpion 5 colored symbol stat of 63 is much more in line with random symbol selection. Maybe it is not English, or it is something else, but I suspect a hoax. Inspired by Vigenere, but instead of using an actual message, just select the symbols from the rows in a Vigenere at random.

I have another rough idea for an experiment though, and will think about it. It has to do with the letter E and unigram repeats at periods less than 16. In scorpion 5 there are quite a few rectangles formed by unigram repeats at period 16n in the same two rows but different columns.

Please see my work of 09FEB16. I believe it is a very good fit for what the message of the Scorpion was. I hope that the FBI will release some more of the Scorpion’s Ciphers.

I can’t find it. If I were to look in the archives, what is the month and year of the original article?

GeoffLaT,

You will find it under “other ciphers” category. SCORPION CIPHERS. ” 14 thoughts on ” Scorpion Ciphers “, February 9, 2016. I believe that my deciphering of the Scorpion Ciphers is the best attempt that I have seen yet. Thanks for your interest.

GeoffLaT: it’s all in the Wayback Machine:

http://web.archive.org/web/20090123041318/http://opordanalytical.com:80/phpBB3/viewtopic.php?f=79&t=200

GeoffLaT and Others,

Look at my work as I listed it on 03JUL17. You will see that it is quite good. I would like to get some more of the contents of the other letters that Mr. Walsh and or the FBI have so that I can work on them also Thanks.