I’ve had the Zodiac Killer Z340 cipher on my mind for the last few days. Though I’m still finding it hard not to draw the conclusion that its top and bottom halves are two different ciphertexts (joined together for reason(s) we can only hazily guess at), what has drawn so much of my attention is a quite different class of statistical observation: letter skips.
Letter Skips
The most (in)famous example of letter skips was the Bible Code, made famous by Michael Drosnin’s (1997) book The Bible Code. However, this was merely one in a long line claiming that the Bible is not only the literal and exact Word of God, but is also an implicit encipherment of all manner of unexpected occult statements and prophecies. To get to these secret messages, all you have to do is read every nth letter, modulo length(Bible): and then, if you hunt through the vast swathes of near-random junk that emerges from that, you’ll eventually discover words, phrases, and proper names that couldn’t possibly have been known millennia ago when the Bible was first written down.
There have been plenty of mathematical and statistical dismissals of the Bible Code, almost all of which reduce to the simple argument that if you search enough random letter sequences for long enough, you’ll find something that sort of looks like text. And so when Drosnin huffed that “When my critics find a message about the assassination of a prime minister encrypted in Moby Dick, I’ll believe them”, his critics took it literally as a challenge. As a result, we now have lists of numerous Drosnin-style letter-skip ‘predictions’ in Moby Dick, along with a ‘prediction’ of Princess Diana’s death [thanks to Brendan McKay].
From which the moral unavoidably seems to be: be careful what you wish for.
Generated Coincidences
At the heart of the Bible Code lies a simple sampling fallacy: which is that if you perform a long enough series of arbitrary statistical analyses on the text of any given document, you will (eventually) uncover things in it which superficially appear extraordinarily improbable.
This is directly relevant to a lot of the Zodiac Killer code-breaking discourse because, broadly speaking, it is exactly what has happened there: diligent statistical enquiry has yielded not only millions of strike-out tests, but also a large number of (superficially) unlikely-looking patterns. And so the question is: if you perform a hundred different statistical tests and one of them happens to yield a pattern that only appears in one in two hundred randomised versions of the same document, have you (a) found something fundamental and causal that could possibly explain everything, or (b) just generated a coincidence that means nothing?
Sadly, there is no obvious way of telling the difference: all one can do is nod sagely and say, in the words of a great 1970s philosopher…
…”COULD BE!”
Transposition or “Tasoiin rnpsto”?
As should be plain as day from the above, I too view Bible Code letter skips as complete nonsense, and reserve my inalienable human right to cast a similarly cool eye over the impressive panoply of Zodiac Killer cipher observations, each of which may or may not be a generated coincidence.
Even so, utter disbelief of the specifics of the Bible Code shouldn’t mask the fact that the kind of statistical tests that are used for letter skips share a significant overlap with the kind of statistical tests that help reveal periodic ciphers and transposition ciphers.
Hence evidence of a letter-skip period in the Zodiac Killer Cipher should not be automatically put to one side because of the test’s association with hallucinatory Bible Code letter-skips, because evidence of a periodic effect could instead be pointing towards one of many other phenomena.
And there is indeed strong evidence of a period in play in the Z340, as first discussed by Daikon and Jarlve in 2015. Daikon examined the number of Z340 bigram repeats at different periods, and found a significant spike at period 19 (this really is noticeably larger than the other periods).
Here’s what these period-19 bigram repeats look like (was this diagram made by David Oranchak?):
Having then performed 1,000,000 random shuffles, David Oranchak concluded that this period-19 result had a “1 in 216” chance of happening. Which is good, but just a smidgeon short of great.
Incidentally, it’s easier to see these bigram matches if you rewrite Z340 in 19-wide columns (this diagram also probably made by David Oranchak):
More tests revealed all manner of similar periodic results that may or may not mean something: but I’m interested here specifically in the period-19 result.
Period-19? So what?
When he constructed the Z340, the Zodiac Killer had previously seen his Z408 cipher not only printed on the front page of newspapers (which surely pleased him), but also very publicly cracked (which surely displeased him). And yet his Z340 cipher closely resembles the Z408 in so many ways that it seems a fairly safe bet to me that his later cipher system was nothing more than a modification (a ‘delta’) of the earlier cipher system rather than something wildly different.
Hence I’ve long suspected that if we could somehow work out what the Zodiac Killer thought was technically wrong with the Z408 cipher system, then we could make a guess what his delta to the Z340 system might be.
Even though the Z408 presented all manner of homophone cycles, it wasn’t these that gave the game away to Donald Gene Harden and Bettye June Harden of Salinas. Rather, they made a number of shrewd psychological guesses (that the most likely first word a psychopath would write was “I”, and that the plaintext would include the word “KILL” multiple times), and used repetitions of “LL” as cribbed ways in to the message.
(As an aside, I struggle to believe that Bettye Harden genuinely guessed from scratch that the first three words of Z408 would be “I LIKE KILLING”, as has been reported. Instead, it seems far more likely to me that she had already worked for several hours on the cipher before making such an inspired guess.)
And so it seems most likely to me that the Zodiac Killer conceived his delta specifically as a way of disrupting the weakness of doubled letters (specifically doubled L), but without really affecting the rest of his code-making approach. And as always in cryptography, there are numerous ways this could be achieved:
* removing the second letter of all doubled letter pairs
* adding in new tokens for specific doubled letters (e.g. use ‘$’ to encipher ‘LL’)
* disrupt the order of the letters (i.e. transpose them) so that ILIKEKILLING becomes IIEILN LKKLIG etc
I’m therefore wondering if his cipher system delta was some kind of period-19 transposition. But – of course – people have already checked for the presence of straightforward period-19 transposition, and have basically drawn a blank. So if there is a period-19 ‘signature’ arising from some kind of transposition, it’s a little more complicated.
But if so, then what would it look like?
A three-way line dance?
My final piece of observational jigsaw in today’s reasoning chain is that the Z340 ciphertext is apparently arranged in groups of three lines. FBI cryptanalyst Dan Olson famously commented that…
Lines 1-3 and 11-13 contain a distinct higher level of randomness than lines 4-6 and 14-16. This appears to be intentional and indicates that lines 1-3 and 11-13 contain valid ciphertext whereas lines 4-6 and 14-16 may be fake.
…though note that this mixes up observation (the first sentence) with his best-guess inference (the second sentence). What I’m instead taking is that Olson’s observation more generally implies that lines are somehow grouped together in sets of three BUT with a spare line added in between the top and bottom half.
So, the overall line grouping sequence of the Z340 appears to be:
* top half: 1-1-1 2-2-2 3-3-3 X [a spare line with “cut marks” at either end of a fake line]
* bottom half: 4-4-4 5-5-5 6-6-6 X [a spare line with ‘ZODAIK’-like fake signature at the end]
Hence – putting it all together – I’m now wondering whether there is a period-19 transposition in play here BUT arranged in groups of three lines at a time. In which case, the symbol sequence for each set of three lines (3 x 17 = 51) might well look like this (where 01 is the first symbol of the plaintext, 02 is the second symbol, etc):
* 01 04 07 10 13 16 19 22 25 28 31 34 37 40 43 46 49
* 47 50 02 05 08 11 14 17 20 23 26 29 32 35 38 41 44
* 42 45 48 51 03 06 09 12 15 18 21 24 27 30 33 36 39
This transposition arrangement would yield both the period-19 effect and the groups-of-three-lines effect: and might also go some of the way towards explaining why lines 10 and 20 function differently to the other lines.
As I mentioned at the top of the post, I also strongly suspect that the top half of the Z340 and the bottom half of the Z340 are separate ciphertext systems, and so any solving should be attempted on the two halves individually, however inconvenient that may be. 🙂
I haven’t tested out this new transposition hypothesis yet: but it’s definitely worth a look, wouldn’t you think, hmmm?