Back in 2007, John Walsh (the host of “America’s Most Wanted”) announced that he had, since 1991, received a string of disturbing-sounding letters from an individual calling himself / herself “The Scorpion”: many of them had sections or pages that were apparently in cipher. Two of these ciphers were released to the public: these became known as “S1” and “S5”.

In the same year, Christopher Farmer (“President of OPORD Analytical”) announced that he had cracked S1 (which was apparently built on a 10×7 grid):-

scorpion3

Farmer’s claimed solution reads like this:-

baelprovid
edthemwith
newstories
butwhatifi
askjwdoiwa
xrtwbonesa
gezjefxkon

Unfortunately, all the diagrams illustrating Farmer’s ingenious reasoning have withered on the Internetty vine in the years since then (they’re not even in the Wayback Machine, nor anywhere else as far as I can see), which is a bit of a shame.

Even so, this turns out to be an entirely surmountable problem: Farmer’s claimed solution is clearly incorrect, for the simple reason that letters in the ciphertext aren’t consistent in the plaintext. For example, the cipher “K” maps to both ‘a’ and ‘g’, the “backwards-L” maps to ‘w’, ‘w’, and ‘x’, the “backwards-F” maps to both ‘u’ and ‘v’, and so on. At the same time, his claimed plaintext doesn’t really make a lot of sense (“BAEL”… really? I’m not so sure).

It seems likely to me that Farmer guessed that “PROVID” was steganographically hidden in plain sight at the end of the topmost line (and if you squint a bit, you can see why that would be), and then built the rest of his decryption attempt around this hopeful starting point. Moreover, he seems to have guessed that “O” maps to ‘o’, and “backwards-E” maps to ‘e’, which are both pretty peachy assignments. But I don’t buy any of this for a minute: there are way too many degrees of freedom in this S1 cryptogram (roughly half of the individual cipher shapes occur exactly once), and quite a few extra ones in his claimed solution too.

It’s a brave attempt, for sure: but it’s still wrong, whichever way you turn it round.

Other people have tried their hand with S1, though both AlanBenjy in 2009 and Glurk on Dave Oranchak’s site in 2010 pessimistically pointed out that 53 of S1’s 70 symbols are unique, yielding a ‘multiplicity’ a fair way beyond the range of what homophonic cryptograms can practically be solved. Hence I would tend to agree with their assessment that there’s no obvious way that we will solve S1 with what we currently have to hand: in fact, there seems no way to tell whether S1 is a real cipher or a hoax – the only repeating cipher pair is “S A” (i.e. “S Λ”), which could well have happened by pure chance.

The only other Scorpion ciphertext released to the public to date is the 180-character cryptogram known as “S5”:-

scorpion4

Once again, 155 of these 180 symbols are unique, which at first glance would seem to make S5 even less likely to be solved than S1.

But wait! In May 2007, user “Teddy” on the OPORD Analytical forum pointed out that if you transpose S5 from a 12-column arrangement to a 16-column layout, shape repeats only ever occur within a single vertical column. In fact, every single 16-way column except one (column #5) includes one or more repeated shapes.

Radically, this suggests to me that S5 was constructed in a completely different way from conventional homophonic ciphers: specifically, I think that each 16-way column of S5 may well have its own unique cipher alphabet. This would mean that S5 would need to be solved in a completely different manner to the way, say, zkdecrypto works. (I don’t believe S5 was constructed with eight columns, but I thought I ought to mention that that’s a possibility as well, however borderline). Maybe that small insight will be enough to help someone make some headway with S5, who can tell?

The huge shame here is that it may well be that the other Scorpion ciphers (which to this day have not been released) might well give us additional clues about the inner workings of both S1 and S5. Specifically, if one of the other ciphers happened to have used precisely the same 16-alphabet systemas S5, it might well give us enough raw data to crack them both.

Has anyone apart from John Walsh ever seen S2, S3, S4, and S6? Just askin’, just askin’…


Update: Looking again at S1 (while bearing in mind the way S5 seems to have been constructed), I find it hard not to notice that the distances between instance repetitions seem strongly clustered around multiples of 5 (with the only instance not fitting the pattern being the “backwards-L” on row #5):-

+60, +20, +50, +36, +24, +20, +40, +20, +40, +25, +35, +10, +25, +6, +45, +9, +6.

I suspect that this means that the encipherer probably enciphered S1 by cycling through five independent cipher alphabets (largely speaking). This wasn’t a mechanically precise encipherment (whether by accident or by design), but something close enough to one such that almost all the time he/she was no more than a single alphabet ‘off’, one way or the other.

This offers a quite different kind of constraint from normal homophonic cipher searches, and possibly even enough to crack the S1 cipher. After all, we have a fair amount of the Scorpion’s meandering plaintext to use as a statistical model to aim for… 🙂

One of the nice things about the unsolved Z340 Zodiac Killer cipher is that we have a previous solved cipher by the same encipherer (i.e. the Z408 cipher), which appears to exhibit many of the same properties as the Z340. Hence, if we could forensically reconstruct how Z408 was constructed (i.e. its cryptographic methodology), we might also gain valuable insights into how the later Z340 was constructed.

One interesting feature of the (solved) Z408 is that even though it is a homophonic substitution cipher (which is to say that several different shapes are used for various plaintext letters), the shape selection is often far from random. In fact, in quite a few instances Z408 shapes appear in a strict cycle, which has led to some recent attempts to crack Z340 by trying (unsuccessfully) to infer homophone cycles.

Curiously, one of the shapes (filled triangle) appears to encipher both A and S: and if you extract all these out, a homophone-cycle-like ASASASAS sequence appears. This intrigued me, so I decided to look at it a little closer: might this somehow be a second layer of cycling?

The answer (I’m now pretty sure) turns out to be no, though it’s still interesting in its own right. Basically, the Zodiac seems to have got confused between dotted triangle (for S) and filled triangle (for A), which caused his cycles to break down. He also miscopied an F-shape as an E-shape: perhaps his working draft wasn’t quite as neat as his final copy, and/or written in felt tip, causing letter shapes to soak into the paper and become slightly less distinct.

If we correct these mistakes and reconstruct what he seems to have intended, we see that he was following a fairly strict cycle most of the time, though getting less ordered towards the end (perhaps from enciphering nausea?):-

A: length-4 homophone cycle = (1) F – (2) dotted square – (3) K – (4) dotted triangle
–> 12341234123413234124211
—-> 16 decisions out of 22 follow the cycle pattern

S: length-4 homophone cycle = (1) 6 – (2) S – (3) reversed L – (4) filled triangle
–> 1241234123412341231412
—-> 18 decisions out of 21 follow the cycle pattern

L is interesting because though that seems to start out as a length-2 homophone cycle [diagonal square – B], the diagonal square then seems to morph into a filled square and then back again to a diagonal square. Hence there’s no obvious sign of an actual length-3 homophone cycle as such, only a miscopied length-2 cycle (which then breaks down halfway through, with four diagonal squares in a row).

Yet even though the Zodiac loves words containing LL (kill, thrill, will, all, etc), he only actually seems to be using a length-2 homophone cycle for L (if slightly miscopied). That is, he is probably using a generalized model of English letter frequency distribution rather than a particular model of his own English letter frequency distribution.

The odd thing is that if you go through Dave Oranchak’s list of Z408 homophone sequences, you’ll see that it doesn’t quite match the traditional “ETAOINSHRDLU” frequency ordering (I count L as length-2):
* Length-7: E
* Length-4: TAOINS
* Length-3: R
* Length-2: LHFD

Was there an American amateur cryptography book of the 1950s or 1960s that espoused this frequency distribution?

As I mentioned here and indeed here a few days ago, my usually-Early-Renaissance-focused thoughts have of late been turning slowly to the Zodiac Killer Ciphers, in particular to the unsolved 340-character cipher known as “Z340”. Unusually as cipher mysteries go, we also have an earlier cipher called “Z408” (no prizes for guessing its length) by the same person, one that was quickly cracked (using the crib “KILL”). Z408 turned out to be a homophonic simple substitution cipher (but with spelling mistakes, copying mistakes, and a few subtly odd features); and there are plenty of good reasons to think that Z340 will share many of these same basic aspects (but made somewhat harder to crack).

Even though it was originally a crib which helped to crack it, Z408 has other weaknesses, most notably the way it sequentially cycles through homophones (“multiple ciphertext shapes for the same plaintext character”). For example, plaintext ‘t’ maps to the four ciphertext homophones HI5L, and appears in the text as the sequence HI5LHI5ILHI5LHI5LHI5LHI5LI5LHL5IIHI. If you count each successful letter-to-letter transition matching the modulo-4 sequence [HI5L] as a 0.25 success event (=26) and each non-match (=8) as a 0.75 failure event, I believe you get a raw probability of less than 1 in a billion (i.e. of at least 26 successes from 34 events). Please check my maths, though – I used this online binomial calculator with N = 35-1, k = 26, p = 0.25, q = 0.75. For more on these homophone sequences, Zodiac ciphermeister Dave Oranchak kindly pointed me at a full list of Z408 homophone sequences.

Incidentally, the top few match counts are:-
e -> ZpW+6NE – N = 54-1, k = 38
t -> HI5L – N = 35-1, k = 26
s -> F@K7 – N = 20-1, k = 15
o -> X!Td – N = 27-1, k = 13
n -> O^D( – N = 23-1, k = 20
i -> 9PUk – N = 44-1, k = 35
a -> GSl8 – N = 26-1, k = 10

It would be great to tell you how statistically significant these sequences are, but I know enough stats to know that it’s not quite as easy as it looks (for a start, we’re preselecting the best order of letters to use) – any passing statisticians, please feel free to leave a comment. I’m also quite surprised that nobody has apparently tried to use this weakness as a direct way to find the Z340 cipher’s homophones (in fact, John Graham-Cumming also blogged about this in June this year), but – as I’ll show shortly – I suspect trying just that on its own wouldn’t be enough.

Taking a brief step sideways, I’m always intrigued by mistakes in ciphers, because these often point to how the cipher was constructed. One interesting feature (but which I’m still trying to understand to my own satisfaction) is the solid triangle cipher shape in Z408, and how it appears to encipher different letters at different times. The view often put forward elsewhere is that this varied due to copying errors, perhaps arising because the Zodiac Killer’s pen was too thick, causing him to misread his draft version. As for me, I’m not so sure, because the solid triangle decrypts to a curious sequence:-
* “A” in “bec-A-use”
* “S” in “mo-S-t dangerous”
* “A” in “an-A-mal”
* “S” in “mo-S-t thrilling”
* “A” in “with -A- girl”
* “S” in “if it i-S-”
* “E” in “my slav-E-s”
* “A” in “my -A-fterlife”

Of these, only the “A” in “an-A-mal” is possibly a copying error (“I” is enciphered by an empty triangle shape) as compared to just a spelling mistake (the Zodiac Killer has plenty of those). But even that seems a little unlikely when the whole ASASAS[E]A pattern that emerges – so very similar to the homophonic sequences discussed above – is pointed out. I haven’t yet figured out what this implies, but it’s pretty interesting, right?

Moving on to the uncracked Z340 cipher, I have to say that what strikes me most is the difference between its top half (lines 1-10) and its bottom half (lines 11-20). It turns out that back in 2009, FBI codebreaker Dan Olson pointed out to Tom at zodiackiller.com that lines 1-3 and 11-13 contained very few repeats: other people have wondered whether this points to some kind of block-level transposition going on. Me, I suspect there’s a far stronger inference to be made: that even though they share nearly all the same character shapes, I’m pretty sure that the top and bottom halves of Z340 use completely different cipher letter assignments, and hence may well need to be cracked independently. Further, I suspect that the Zodiac may well have intended to send them out separately (Z408 was sent as three independent sections), but (for some reason) ended up sending them both as a single cipher.

[Incidentally, I also don’t believe that the last few letters of the bottom half of Z340 are genuinely part of the ciphertext to be cracked: they seem to spell “ZODAIK”, which is just a touch too coincidental for me. 🙂 ]

Right now, I think that a constructive first big step would be to search for statistically significant homophone sequences in the top and bottom halves of Z340, because we can be reasonably sure that the most frequent letters will probably have four or more homophones, just as with the Z408 cipher: trying this out may well yield some surprisingly revealing results. Any takers at the FBI? 😉