I’ve had the Zodiac Killer Z340 cipher on my mind for the last few days. Though I’m still finding it hard not to draw the conclusion that its top and bottom halves are two different ciphertexts (joined together for reason(s) we can only hazily guess at), what has drawn so much of my attention is a quite different class of statistical observation: letter skips.
Letter Skips
The most (in)famous example of letter skips was the Bible Code, made famous by Michael Drosnin’s (1997) book The Bible Code. However, this was merely one in a long line claiming that the Bible is not only the literal and exact Word of God, but is also an implicit encipherment of all manner of unexpected occult statements and prophecies. To get to these secret messages, all you have to do is read every nth letter, modulo length(Bible): and then, if you hunt through the vast swathes of near-random junk that emerges from that, you’ll eventually discover words, phrases, and proper names that couldn’t possibly have been known millennia ago when the Bible was first written down.
There have been plenty of mathematical and statistical dismissals of the Bible Code, almost all of which reduce to the simple argument that if you search enough random letter sequences for long enough, you’ll find something that sort of looks like text. And so when Drosnin huffed that “When my critics find a message about the assassination of a prime minister encrypted in Moby Dick, I’ll believe them”, his critics took it literally as a challenge. As a result, we now have lists of numerous Drosnin-style letter-skip ‘predictions’ in Moby Dick, along with a ‘prediction’ of Princess Diana’s death [thanks to Brendan McKay].
From which the moral unavoidably seems to be: be careful what you wish for.
Generated Coincidences
At the heart of the Bible Code lies a simple sampling fallacy: which is that if you perform a long enough series of arbitrary statistical analyses on the text of any given document, you will (eventually) uncover things in it which superficially appear extraordinarily improbable.
This is directly relevant to a lot of the Zodiac Killer code-breaking discourse because, broadly speaking, it is exactly what has happened there: diligent statistical enquiry has yielded not only millions of strike-out tests, but also a large number of (superficially) unlikely-looking patterns. And so the question is: if you perform a hundred different statistical tests and one of them happens to yield a pattern that only appears in one in two hundred randomised versions of the same document, have you (a) found something fundamental and causal that could possibly explain everything, or (b) just generated a coincidence that means nothing?
Sadly, there is no obvious way of telling the difference: all one can do is nod sagely and say, in the words of a great 1970s philosopher…
…”COULD BE!”
Transposition or “Tasoiin rnpsto”?
As should be plain as day from the above, I too view Bible Code letter skips as complete nonsense, and reserve my inalienable human right to cast a similarly cool eye over the impressive panoply of Zodiac Killer cipher observations, each of which may or may not be a generated coincidence.
Even so, utter disbelief of the specifics of the Bible Code shouldn’t mask the fact that the kind of statistical tests that are used for letter skips share a significant overlap with the kind of statistical tests that help reveal periodic ciphers and transposition ciphers.
Hence evidence of a letter-skip period in the Zodiac Killer Cipher should not be automatically put to one side because of the test’s association with hallucinatory Bible Code letter-skips, because evidence of a periodic effect could instead be pointing towards one of many other phenomena.
And there is indeed strong evidence of a period in play in the Z340, as first discussed by Daikon and Jarlve in 2015. Daikon examined the number of Z340 bigram repeats at different periods, and found a significant spike at period 19 (this really is noticeably larger than the other periods).
Here’s what these period-19 bigram repeats look like (was this diagram made by David Oranchak?):
Having then performed 1,000,000 random shuffles, David Oranchak concluded that this period-19 result had a “1 in 216” chance of happening. Which is good, but just a smidgeon short of great.
Incidentally, it’s easier to see these bigram matches if you rewrite Z340 in 19-wide columns (this diagram also probably made by David Oranchak):
More tests revealed all manner of similar periodic results that may or may not mean something: but I’m interested here specifically in the period-19 result.
Period-19? So what?
When he constructed the Z340, the Zodiac Killer had previously seen his Z408 cipher not only printed on the front page of newspapers (which surely pleased him), but also very publicly cracked (which surely displeased him). And yet his Z340 cipher closely resembles the Z408 in so many ways that it seems a fairly safe bet to me that his later cipher system was nothing more than a modification (a ‘delta’) of the earlier cipher system rather than something wildly different.
Hence I’ve long suspected that if we could somehow work out what the Zodiac Killer thought was technically wrong with the Z408 cipher system, then we could make a guess what his delta to the Z340 system might be.
Even though the Z408 presented all manner of homophone cycles, it wasn’t these that gave the game away to Donald Gene Harden and Bettye June Harden of Salinas. Rather, they made a number of shrewd psychological guesses (that the most likely first word a psychopath would write was “I”, and that the plaintext would include the word “KILL” multiple times), and used repetitions of “LL” as cribbed ways in to the message.
(As an aside, I struggle to believe that Bettye Harden genuinely guessed from scratch that the first three words of Z408 would be “I LIKE KILLING”, as has been reported. Instead, it seems far more likely to me that she had already worked for several hours on the cipher before making such an inspired guess.)
And so it seems most likely to me that the Zodiac Killer conceived his delta specifically as a way of disrupting the weakness of doubled letters (specifically doubled L), but without really affecting the rest of his code-making approach. And as always in cryptography, there are numerous ways this could be achieved:
* removing the second letter of all doubled letter pairs
* adding in new tokens for specific doubled letters (e.g. use ‘$’ to encipher ‘LL’)
* disrupt the order of the letters (i.e. transpose them) so that ILIKEKILLING becomes IIEILN LKKLIG etc
I’m therefore wondering if his cipher system delta was some kind of period-19 transposition. But – of course – people have already checked for the presence of straightforward period-19 transposition, and have basically drawn a blank. So if there is a period-19 ‘signature’ arising from some kind of transposition, it’s a little more complicated.
But if so, then what would it look like?
A three-way line dance?
My final piece of observational jigsaw in today’s reasoning chain is that the Z340 ciphertext is apparently arranged in groups of three lines. FBI cryptanalyst Dan Olson famously commented that…
Lines 1-3 and 11-13 contain a distinct higher level of randomness than lines 4-6 and 14-16. This appears to be intentional and indicates that lines 1-3 and 11-13 contain valid ciphertext whereas lines 4-6 and 14-16 may be fake.
…though note that this mixes up observation (the first sentence) with his best-guess inference (the second sentence). What I’m instead taking is that Olson’s observation more generally implies that lines are somehow grouped together in sets of three BUT with a spare line added in between the top and bottom half.
So, the overall line grouping sequence of the Z340 appears to be:
* top half: 1-1-1 2-2-2 3-3-3 X [a spare line with “cut marks” at either end of a fake line]
* bottom half: 4-4-4 5-5-5 6-6-6 X [a spare line with ‘ZODAIK’-like fake signature at the end]
Hence – putting it all together – I’m now wondering whether there is a period-19 transposition in play here BUT arranged in groups of three lines at a time. In which case, the symbol sequence for each set of three lines (3 x 17 = 51) might well look like this (where 01 is the first symbol of the plaintext, 02 is the second symbol, etc):
* 01 04 07 10 13 16 19 22 25 28 31 34 37 40 43 46 49
* 47 50 02 05 08 11 14 17 20 23 26 29 32 35 38 41 44
* 42 45 48 51 03 06 09 12 15 18 21 24 27 30 33 36 39
This transposition arrangement would yield both the period-19 effect and the groups-of-three-lines effect: and might also go some of the way towards explaining why lines 10 and 20 function differently to the other lines.
As I mentioned at the top of the post, I also strongly suspect that the top half of the Z340 and the bottom half of the Z340 are separate ciphertext systems, and so any solving should be attempted on the two halves individually, however inconvenient that may be. 🙂
I haven’t tested out this new transposition hypothesis yet: but it’s definitely worth a look, wouldn’t you think, hmmm?
Very interesting idea, Nick. I vote “solid yes” to the question of whether this is worth a look. Would be interesting to find out of any other desired features “pop out” when untransposing with those schemes, such as improved homophone cycles, more repeating fragments, etc.
I still wonder very much about the “+” symbols. They are far too numerous, behave as random symbols (i.e., average of all their positions works out to dead center of cipher), and don’t play nicely with other symbols when it comes to homophone cycles. Do they have some relationship to whatever scheme is producing the period 19 phenomenon?
I also confirm that I probably made those diagrams. 🙂
Dave: my suspicion is that the + shape works differently in the two halves, and I’m really not sure what’s going on inside its head, so to speak. I used to think that + would turn out to be the weakest link in the Z340’s cryptographic chain, but I’m now wondering whether the period-19 and three-line blocks might be that instead.
David: thinking about it a little more, it seems likely that (in my enthusiasm) I overshot the target in this post, insofar as the period-19 effect would need to be validated in a different way post-transposition.
More generally, though, the idea that we should detect ‘good’ transpositions by both repeated bigrams AND homophone cycles is a very good one indeed: there’s no obvious reason to think that the Zodiac Killer would have significantly changed his ways to work around a computer technique that only became apparent decades later.
What would be a good test for the statistical significance of the presence of homophone cycles in a candidate transposed cipher (or portion of a cipher)? When I blogged about this before (discussing Raddum and Sys’ 2010 paper), I thought if you could calculate this e.g. by multiplying the improbability of the top 6 detected homophone found together to form a composite “homophone cycle-ness” measure.
I’ve been working on a technique based on significance testing. For each detected cycle, compare its “run length” (i.e., AB AB AB BA has a run length of 3) to the mean run length of shuffles. A strongly homophonic cipher will tend to have a lot of cycles with runs that deviate significantly from the mean of shuffles. This is true of Z408. One of my calculations suggests Z340 is only about a third as homophonic as Z408, but still more homophonic than random shuffles. This makes me wonder if Zodiac’s mysterious encipherment steps were enough to significantly perturb homophonic encoding but not enough to eliminate it completely.
BTW I think I posted two more comments before now – did they get lost in the ether?
Dave: sadly both disappeared. 🙁
If you want to include links, you have to remember to replace the first : and the last . in each link with spaces, and I’ll reassemble them here by hand. It’s sad, but Cipher Mysteries receives a simply unbelievable amount of spammed comments, and I have to use something fairly brutal to stand any chance of managing it. 🙁 🙁
Ah, ok – I keep forgetting about that.
Anyway, let me try again — Jarl made an interesting observation about unigram distances:
http://zodiackillersite.com/viewtopic.php?p=53902#p53902
To reproduce the effect, he created a cipher just as you described: One key for the top half, another for the bottom half. Perhaps that is a point in favor of your dual key idea!
Dave: the behaviour of + is also something of an indication that there are two behavioral halves. I also think we have to be more decisive in our distrust of lines 10 and 20. 🙂
And what of the centered dots? Curious minimalist little fellows, those: 4 to 2 (lower half of 340 vs. upper half). Very odd, simplistic dots in the context of the Zodiac’s more elaborate designs . . . like placeholders perhaps, which subsequently he may have converted into other symbols – some plusses, even – pre-mailing. Now are there any (statistical) theories that take this possibility into consideration? 😉
Hey Nick,
Your period 19 per 3 lines idea is referred to as multiple inscription rectangles in classical cryptography and is something that we have looked into at the zodiackillersite forum. Its opposite is called polyliteral transposition and transposes more than one character per time. With AZdecrypt I can statistically detect and solve these kind of ciphers if the inscription rectangles are of regular size and are one dimensionally stacked either horizontally or vertically.
Within the period 19 hypothesis, if the 340 was transposed after or during the sequential homophonic substitution process then there should be a bigram and homophone sequence peak at period 19. Since there is only a bigram peak at period 19, transposition after or during sequential homophonic substitution is therefore simply ruled out.
What if I had some pretty good ideas on what the first few lines say?
Steve: that would depend on how close to the real thing your ideas are. 🙂
My question would be – was the methodology of the hardens in the public domain at the time of z340? Would the killer have known how they cracked z408?
Roger Peck: the Hardens didn’t have a methodology, they were just amateur codebreakers using rational guesses (e.g. the first letter would be ‘I’, “KILL” would appear numerous times, etc) to find a way in, much like solving a crossword.
Steve… Lets here what u think
I struggle to believe that Bettye Harden genuinely guessed from scratch that the first three words of Z408 would be “I LIKE KILLING”
Your confusion seems to stem from an oversimplification:
– Given the writer’s egomaniacal tendencies, she guessed, rather correctly, that, he would most likely begin his letter with the first person pronoun, `I`.
– Then, from frequency tables of English doublets, she deduced that the double half-filled square corresponds to the double L ; after all, the most frequent letters, and the most frequent double letters, are not necessarily the same; the Vigenère cipher employed by the Zodiac enabled him to evade the former, but not the latter.
– Lastly, since the letter in question was penned by a (serial) killer, the presence of I`s next to double L`s helped steer the inductive process in the somewhat obvious direction.