When talking about the Zodiac Killer Z340 cipher, FBI cryptanalyst Dan Olson once pointed out that:

Statistical tests indicate a higher level of randomness by row, than by column. This indicates that the cipher is written horizontally and rules out any transposition patterns that are not strictly horizontal.

Here, while I’d agree with his observation part (the first sentence), I’m really not so sure about the conclusion part (the second sentence). And a little further on, Olson continues:

Row randomness of 408 is .22, 340 is .19. Column randomness of 408 is .48, 340 is .68. By way of comparison, row and column randomness should be near identical if the 340 does not contain any message, or if there is a message that is evenly scrambled.

This second time round, I’m comfortable with the observations here (the first two sentences), and mostly comfortable with Olson’s conclusion (the last sentence). However, I’d add that you have to be careful with his conclusion, because there is an implicit (but incorrect) follow-on conclusion lurking just beyond its limits for many readers: that if the cipher is not sequenced along columns, it must surely be primarily sequenced along rows of the text.

On the positive side, I would agree that we can conclude from this that we are not looking at a ‘pure’ periodic transposition cipher (i.e. one that rakes over the whole ciphertext, or even over the top or bottom halves). But what would it mean to assert that the Z340 is a bit more horizontal than vertical, though not as horizontal as the Z408?

An New Axis to Grind?

My (admittedly as-yet-hypothetical) explanation for all of the above is that what lurks behind is perhaps a short transposition cycle (i.e. no more than two or three elements long), where the elements are arranged across two or three consecutive lines, and where the end of each cycle steps back to the letter position immediately after the beginning of the cycle.

According to this, each ciphertext line would contain every second or third letter in the plaintext: for even though this would weaken the horizontal (row) adjacency patterning, it would not eliminate it. And statistically, this is essentially what we see: weakened horizontal patterning but no obvious vertical patterning. Because of the apparent groups of three lines (also noted by Olson), I suspect that these are arranged over three lines: and so this forms my primary hypothesis going forward.

A Quick JavaScript Test

I’ve posted up a quick JavaScript gist of what I’m talking about here: https://gist.github.com/anonymous/c53f88caf1dc6bd18a6bf6af45895b2c

The preliminary results of running this code fragment yields a different internal structure to each of the two halves (various intriguing results in bold):

Top half, first nine lines:
0: off2 = 3, off3 = 3, metric = 8
1: off2 = 2, off3 = 6, metric = 8
2: off2 = 2, off3 = 3, metric = 8
3: off2 = 0, off3 = 3, metric = 7

4: off2 = 3, off3 = 14, metric = 6
5: off2 = 1, off3 = 7, metric = 6
6: off2 = 0, off3 = 7, metric = 6
7: off2 = 3, off3 = 2, metric = 5
8: off2 = 2, off3 = 7, metric = 5
9: off2 = 2, off3 = 5, metric = 5

Bottom half, first nine lines:
0: off2 = 1, off3 = 0, metric = 10
1: off2 = 3, off3 = 11, metric = 9
2: off2 = 3, off3 = 10, metric = 9
3: off2 = 0, off3 = 4, metric = 9
4: off2 = 3, off3 = 15, metric = 8
5: off2 = 0, off3 = 8, metric = 8
6: off2 = 4, off3 = 8, metric = 7
7: off2 = 4, off3 = 4, metric = 7
8: off2 = 2, off3 = 15, metric = 7
9: off2 = 0, off3 = 10, metric = 7

Note that the period-19 (i.e. 17+2) effect is still slightly visible in the top half, but it’s much less apparent in the bottom half.

However, the most striking new pattern here is the (off2 = 1, off3 = 0) pattern in the bottom half, that yields ten pair matches in the untransposed text. This is the kind of zigzag transposition pattern one might expect of what Filippo Sinagra calls “peasant ciphers” – improvised amateur cryptographic tricks, that aim for security through obscurity.

Of course, I still have no idea whether or not I’m merely generating coincidences from the 17 x 17 x 2 = 578 permutations being examined here. But nonetheless it’s all quite interesting, right?

I’ve had the Zodiac Killer Z340 cipher on my mind for the last few days. Though I’m still finding it hard not to draw the conclusion that its top and bottom halves are two different ciphertexts (joined together for reason(s) we can only hazily guess at), what has drawn so much of my attention is a quite different class of statistical observation: letter skips.

Letter Skips

The most (in)famous example of letter skips was the Bible Code, made famous by Michael Drosnin’s (1997) book The Bible Code. However, this was merely one in a long line claiming that the Bible is not only the literal and exact Word of God, but is also an implicit encipherment of all manner of unexpected occult statements and prophecies. To get to these secret messages, all you have to do is read every nth letter, modulo length(Bible): and then, if you hunt through the vast swathes of near-random junk that emerges from that, you’ll eventually discover words, phrases, and proper names that couldn’t possibly have been known millennia ago when the Bible was first written down.

There have been plenty of mathematical and statistical dismissals of the Bible Code, almost all of which reduce to the simple argument that if you search enough random letter sequences for long enough, you’ll find something that sort of looks like text. And so when Drosnin huffed that “When my critics find a message about the assassination of a prime minister encrypted in Moby Dick, I’ll believe them”, his critics took it literally as a challenge. As a result, we now have lists of numerous Drosnin-style letter-skip ‘predictions’ in Moby Dick, along with a ‘prediction’ of Princess Diana’s death [thanks to Brendan McKay].

From which the moral unavoidably seems to be: be careful what you wish for.

Generated Coincidences

At the heart of the Bible Code lies a simple sampling fallacy: which is that if you perform a long enough series of arbitrary statistical analyses on the text of any given document, you will (eventually) uncover things in it which superficially appear extraordinarily improbable.

This is directly relevant to a lot of the Zodiac Killer code-breaking discourse because, broadly speaking, it is exactly what has happened there: diligent statistical enquiry has yielded not only millions of strike-out tests, but also a large number of (superficially) unlikely-looking patterns. And so the question is: if you perform a hundred different statistical tests and one of them happens to yield a pattern that only appears in one in two hundred randomised versions of the same document, have you (a) found something fundamental and causal that could possibly explain everything, or (b) just generated a coincidence that means nothing?

Sadly, there is no obvious way of telling the difference: all one can do is nod sagely and say, in the words of a great 1970s philosopher…

…”COULD BE!

Transposition or “Tasoiin rnpsto”?

As should be plain as day from the above, I too view Bible Code letter skips as complete nonsense, and reserve my inalienable human right to cast a similarly cool eye over the impressive panoply of Zodiac Killer cipher observations, each of which may or may not be a generated coincidence.

Even so, utter disbelief of the specifics of the Bible Code shouldn’t mask the fact that the kind of statistical tests that are used for letter skips share a significant overlap with the kind of statistical tests that help reveal periodic ciphers and transposition ciphers.

Hence evidence of a letter-skip period in the Zodiac Killer Cipher should not be automatically put to one side because of the test’s association with hallucinatory Bible Code letter-skips, because evidence of a periodic effect could instead be pointing towards one of many other phenomena.

And there is indeed strong evidence of a period in play in the Z340, as first discussed by Daikon and Jarlve in 2015. Daikon examined the number of Z340 bigram repeats at different periods, and found a significant spike at period 19 (this really is noticeably larger than the other periods).

Here’s what these period-19 bigram repeats look like (was this diagram made by David Oranchak?):

Having then performed 1,000,000 random shuffles, David Oranchak concluded that this period-19 result had a “1 in 216” chance of happening. Which is good, but just a smidgeon short of great.

Incidentally, it’s easier to see these bigram matches if you rewrite Z340 in 19-wide columns (this diagram also probably made by David Oranchak):

More tests revealed all manner of similar periodic results that may or may not mean something: but I’m interested here specifically in the period-19 result.

Period-19? So what?

When he constructed the Z340, the Zodiac Killer had previously seen his Z408 cipher not only printed on the front page of newspapers (which surely pleased him), but also very publicly cracked (which surely displeased him). And yet his Z340 cipher closely resembles the Z408 in so many ways that it seems a fairly safe bet to me that his later cipher system was nothing more than a modification (a ‘delta’) of the earlier cipher system rather than something wildly different.

Hence I’ve long suspected that if we could somehow work out what the Zodiac Killer thought was technically wrong with the Z408 cipher system, then we could make a guess what his delta to the Z340 system might be.

Even though the Z408 presented all manner of homophone cycles, it wasn’t these that gave the game away to Donald Gene Harden and Bettye June Harden of Salinas. Rather, they made a number of shrewd psychological guesses (that the most likely first word a psychopath would write was “I”, and that the plaintext would include the word “KILL” multiple times), and used repetitions of “LL” as cribbed ways in to the message.

(As an aside, I struggle to believe that Bettye Harden genuinely guessed from scratch that the first three words of Z408 would be “I LIKE KILLING”, as has been reported. Instead, it seems far more likely to me that she had already worked for several hours on the cipher before making such an inspired guess.)

And so it seems most likely to me that the Zodiac Killer conceived his delta specifically as a way of disrupting the weakness of doubled letters (specifically doubled L), but without really affecting the rest of his code-making approach. And as always in cryptography, there are numerous ways this could be achieved:
* removing the second letter of all doubled letter pairs
* adding in new tokens for specific doubled letters (e.g. use ‘$’ to encipher ‘LL’)
* disrupt the order of the letters (i.e. transpose them) so that ILIKEKILLING becomes IIEILN LKKLIG etc

I’m therefore wondering if his cipher system delta was some kind of period-19 transposition. But – of course – people have already checked for the presence of straightforward period-19 transposition, and have basically drawn a blank. So if there is a period-19 ‘signature’ arising from some kind of transposition, it’s a little more complicated.

But if so, then what would it look like?

A three-way line dance?

My final piece of observational jigsaw in today’s reasoning chain is that the Z340 ciphertext is apparently arranged in groups of three lines. FBI cryptanalyst Dan Olson famously commented that…

Lines 1-3 and 11-13 contain a distinct higher level of randomness than lines 4-6 and 14-16. This appears to be intentional and indicates that lines 1-3 and 11-13 contain valid ciphertext whereas lines 4-6 and 14-16 may be fake.

…though note that this mixes up observation (the first sentence) with his best-guess inference (the second sentence). What I’m instead taking is that Olson’s observation more generally implies that lines are somehow grouped together in sets of three BUT with a spare line added in between the top and bottom half.

So, the overall line grouping sequence of the Z340 appears to be:
* top half: 1-1-1 2-2-2 3-3-3 X [a spare line with “cut marks” at either end of a fake line]
* bottom half: 4-4-4 5-5-5 6-6-6 X [a spare line with ‘ZODAIK’-like fake signature at the end]

Hence – putting it all together – I’m now wondering whether there is a period-19 transposition in play here BUT arranged in groups of three lines at a time. In which case, the symbol sequence for each set of three lines (3 x 17 = 51) might well look like this (where 01 is the first symbol of the plaintext, 02 is the second symbol, etc):

* 01 04 07 10 13 16 19 22 25 28 31 34 37 40 43 46 49
* 47 50 02 05 08 11 14 17 20 23 26 29 32 35 38 41 44
* 42 45 48 51 03 06 09 12 15 18 21 24 27 30 33 36 39

This transposition arrangement would yield both the period-19 effect and the groups-of-three-lines effect: and might also go some of the way towards explaining why lines 10 and 20 function differently to the other lines.

As I mentioned at the top of the post, I also strongly suspect that the top half of the Z340 and the bottom half of the Z340 are separate ciphertext systems, and so any solving should be attempted on the two halves individually, however inconvenient that may be. 🙂

I haven’t tested out this new transposition hypothesis yet: but it’s definitely worth a look, wouldn’t you think, hmmm?

Here’s the evidence that the Zodiac Killer is alive and busy with a spray can in Cyprus, visual documentary evidence to which only the most obtuse could possibly object:

And if you think that’s the most ridiculous and/or foolish cipher theory you’ve encountered in the last seven days, you obviously haven’t been paying much attention. 🙁

OK, so there’s like another Zodiac film coming out this summer (2017), and it’s like called Awakening The Zodiac. And if that’s not just like totally thrilling enough for you kerrrazy cipher people already, there’s also a trailer on YouTube long enough to eat a couple of mouthfuls of popcorn (maybe three tops):

I know, I know, some haters are gonna say that it’s disrepectful to the memory of the dead, given that the Zodiac claimed to have killed 37 people, and that the film makers are just building cruddy entertainment on top of their families’ suffering. But it’s just Hollllllllllywood, people, or rather about as Hollywood as you can get when you film it on the cheap in Canada. Though if the pitch was much more elaborate than “Storage Hunters meets serial killer”, you can like paint my face orange and call me Veronica.

Seriously, though, I’d be a little surprised if anyone who knows even 1% more than squat about ciphers was involved: if my eyes don’t deceive me, there certainly ain’t no “Oranchak” in the credits. Maybe there’ll turn out to be hidden depths here: but – like the Z340 – if there are, they’re very well hidden indeed.

In a recent post here, I floated the idea that the Zodiac Killer’s Z408 (solved) cipher’s unusual homophone distribution may have arisen not conceptually (i.e. from a hitherto-unknown book on cryptography), but instead empirically (i.e. emerging from the properties of a specific text).

It’s certainly possible that he might have used his own (private) text to model his homophone distribution, in which case we probably almost no chance of reconstructing it. However, I think it likely that he instead used the first few characters of an already existing public text (such as Moby Dick, the Book of Genesis, the Declaration of Independence, or whatever) to do this.

It’s a reasonable enough suggestion, I think: and moreover one that we can try to test to a reasonable degree.

Z408’s homophones

A homophonic cipher key allocates a number of cipher shapes to individual plaintext letters, usually (but not always) in broad proportion to their frequency. So in a typical homophonic cipher key you would expect to see far more shapes for E (the most common letter in English) than for, say, Z or Q.

Though this is essentially the case for what we see in the Z408 cipher (particularly for the more frequent letters, ETAOINS), the numbers of homophones chosen for the less frequent letters seem somewhat idiosyncratic and arbitrary:

7 shapes – E
4 shapes – T A O I N S
3 shapes – L R
2 shapes – D F H
1 shape  – B C G K M P U V W X Y
Did not appear: J Q Z

People have long searched for a primer or textbook on cryptography where the description of the alphabetic frequency distribution matches this, or even where the alphabetic frequency ordering (e.g. ETAOINSHRDLU etc) matches the order here, but in vain.

Designing a filter

The basic idea for the filter is easy enough:
* read in characters from the start of a passage (we’re only interested in capitalized alphabetic letters, i.e. A-Z)
* if the instance count of that character is higher than the top of the desired range, then the test fails
* if the instance counts for all the characters are within the desired range at the same time, then the test passes
* else keep reading in more characters until the test terminates

As a side note: of all the Z408 homophones, only X appears exactly once in the Z408 ciphertext itself: but while it is conceivable that the Zodiac Killer might have allocated extra homophones for X, it does seem fairly unlikely.

The desired ranges for each of the characters would look like this (though feel free to adapt this if you disagree with the homophone counts listed above):

[7,7] – E
[4,4] – T A O I N S
[3,3] – L R
[2,2] – D F H
[0,1] – B C G K M P U V W Y J Q Z
[0,3] – X (to err on the side of safety)

Note that the single-letter characters have a slightly broader [0,1] range because we have no way of knowing whether or not they would have actually appeared in the original text.

Here are two test texts that should both pass:

EEEEEEETTTTAAAAOOOOIIIINNNNSSSSLLLRRRDDFFHHZZZZZZZZZZZZZZZZZ

BCGKMPUVWYJQZXEEEEEEETTTTAAAAOOOOIIIINNNNSSSSLLLRRRDDFFHHZZZ

Which texts to try?

Though any text published before August 1969 would potentially be a match, it would make sense to look at all manner of texts, and possibly even the first few lines of different chapters of books (though I’d be a little surprised if that was the case). All the same, the filter is easy enough to write (and should execute in a matter of microseconds) and to test, so the difficulty here lies mostly in getting hold of enough texts to try, rather than the compute time as such.

Oddly, I don’t really have a solid feel for how often the filter will find a match: my gut instinct is that roughly one in a million English text comparisons will pass, but that’s just a guesstimate based on each letter having its own little bell-curve distribution, all of which have to match at the same time.

So what do you think will match? “Catcher in the Rye” or “Moby Dick”? Place your bets! 😉

Given that the Zodiac Killer’s first big cipher (the Z408) got cracked so quickly, it shouldn’t really be a surprise that he used a slightly different system for his second big cipher (the Z340). What is (arguably) surprising is that whatever change he made to it has not been figured out since then.

But what was he thinking? What did he want from a cipher? And how might his needs have changed between Z408 and Z340?

The Z408

Ciphers are normally made to be as strong as practically possible, given the technological, time, and resource constraints that apply to both sender and receiver: and with the two main driving needs being privacy and secrecy. Note that these aren’t always the same thing: the way I usually describe it is that while sex with your husband is private, sex with your tennis coach is secret. 😉

And so the first thing I find cryptographically interesting about the Zodiac Killer is that he was creating a cipher from a slightly angle from either of these: and he certainly wasn’t trying to communicate in any normal sense of the word.

Rather, I think that the point of Z408 was to be taunting, and to demonstrate to the police that he was in control, not them.

So imagine the Zodiac’s probable fury, then, when little more than a week after his three Z408 cryptograms appeared in local newspapers (the Vallejo Times-Chronicle, the San Francisco Examiner and the San Francisco Chronicle), Donald and Bettye Harden were all over the front pages explaining how they had cracked them.

Didn’t they know who was supposed to be in control here?

What was worse, the Hardens hadn’t used cryptological hardware or even high-powered cryptological smarts. They’d just used the Zodiac’s egoism (they guessed the first letter was “I”) and his psychopathic bragging (they guessed he would use the word KILL multiple times) as keys to his cryptographic front door: and then marched straight in.

I think it’s fairly safe to expect that the Zodiac was pretty pissed off by this.

Note that the Hardens carried on trying to crack the Z340 for many years afterwards: according to their daughter, her “mother wrote poetry and was as absorbed in her writing as she became with the Zodiac codes. She worked on the second code on and off for the rest of her life.

The Z340

Comparing the overall style of the Z340 with that of the Z408, there seems to be plenty of reasons to think that the two are, at heart, not wildly different from each other. And yet (as is widely known) all the big-brained homophonic solvers written since haven’t made any impact on the Z340 at all.

All the same, I think the second interesting thing to note is that the changes to the Z340 system were surely not made to defend against computer-assisted codebreaking (because that hadn’t yet happened), but rather to make the updated system Harden-hardened, so to speak.

What does this mean? Well, we can probably infer that the first letter of the Z340 is almost certainly not I (not that that helps us a great deal) and the Zodiac Killer must have done something to conceal or remove the KILL weakness.

But, in my opinion, that latter change would surely not have been a theoretically-motivated cryptographic adaptation (he was without much doubt an amateur cryptographer), but rather something pragmatic and empirical, perhaps along the lines of:
* adding a repeat-the-last-letter token
* add an LL token
* add an ILL token
* add nulls inside tell-tale words
* etc

But there’s a problem with all of these. In fact, there are several problems. 🙁

The Problems

The first problem is that I don’t currently believe any of the above changes are disruptive enough to explain what we see in the Z340.

The basic stats of the four main Zxxx ciphers are:
Z408: 408 symbols, from a set of 54 unique symbols. (Note: E has 7 homophones, AST have 6 each, IO have 5 each, N has 4, FLR have 3 each, DHW have 2 each, everything else has 1).
Z340: 340 symbols, from a set of 63. [Hence symbols/textsize is 18.5%, a fair bit higher than the Z408’s 13.3%]
Z32: 32 symbols, from a set of 30.
Z13: 13 symbols, from a set of 8.

It would be very tempting to suspect (as many people have) that the Z340 is ‘therefore’ just the same as Z408 but with 39% more homophones. Yet a problem with this popular hypothesis is that it should be well within range of automated homophone solvers, and to date they haven’t managed to make any impact.

A second problem is that the kind of homophone cycles that so characterized the Z408 seem to be largely absent in the Z340: and yet because the Zodiac Killer would not have had any clue that these were a technical weakness of his system, it seems unlikely to me that he would have adjusted his system to work around a weakness that he didn’t actually know was a weakness.

A third problem is that the Z340 has a fair number of asymmetries that don’t fit the it’s-a-straight-homophonic-cipher model. For example, lines 1-3 and 11-13 have (as Dan Olson pointed out some years ago) almost no character repeats.

There are yet other asymmetries: for example, while 63 different symbols appear in the top ten lines, only 60 appear in the bottom ten lines. And there’s the mysterious ‘-‘ shape at the start and end of line 10: and the odd-looking “ZODAIK” sequence on line 20.

One final asymmetry: the ‘+’ shape seems to function differently in the top and bottom halves – it is often preceded by ‘M’ in the top half, but never preceded by ‘M’ in the bottom half.

How does assuming the Z340 is a pure homophonic cipher explain any of these behaviours, let alone all of them?

Lines 1-3 and 11-13, revisited

I keep coming back to the 1-3 and 11-13 property as mentioned here. I think it’s important to say that Dan Olson’s conclusion (that “lines 1-3 and 11-13 contain valid ciphertext whereas lines 4-6 and 14-16 may be fake”) seems likely to be landing a little bit wide of the mark.

To me, this same property of these lines implies (a) that the homophonic versions for each letter were probably used in pure sequence here, but also (b) the homophone cycles were somehow ‘reset’ after ten lines (i.e. the homophone cycles all started again at the start of line eleven). And perhaps also that any characters repeated in the first three lines are rarer characters, rather than the homophone-friendly ETAOINSHRDLU etc.

It might even be that the Zodiac Killer kept on adding homophones as he constructed the cipher UNTIL he had three lines’ worth of essentially unique homophones: that is to say, that the three line blocks in 1-3 and 11-13 are how his system made the choice of the number of homophones, rather than as a consequence of the number of homophones he chose. Nobody has yet (to my knowledge) satisfactorily explained where he came up with his homophonic allocation for Z408: certainly, searching for this in crypto books hasn’t yielded any likely candidates.

Could it be that the Zodiac Killer worked backwards from his actual Z408 ciphertext to determine the number of homophones, rather than worked forward from the number of homophones to the ciphertext?

Update: I received the following off-line comment from David Oranchak, but thought it better to update it within the post itself…

Nick, there are a few other seemingly rare phenomena that can be observed in Z340. I’m curious what you think of them.

The first is the pivots:

http://zodiackillerciphers.com/wiki/index.php?title=Encyclopedia_of_observations#The_.22Pivots.22

Those kinds of patterns are difficult to arise by chance, so they are suspected to be some sort of feature of the encoding scheme.

Z408 is littered with repeating bigrams but Z340 seems to have fewer than would be expected via normal homophonic encipherment of a plaintext in a normal reading direction. However, the bigrams show up again if you consider a periodic operation on the cipher text:

http://zodiackillerciphers.com/wiki/index.php?title=Encyclopedia_of_observations#Periodic_ngram_bias

The count of 25 repeating bigrams jumps to 37 or 41 or even higher, depending on the periodic operation applied to the cipher text. Here is a tool that illustrates the various operations:

http://zodiackillerciphers.com/period-19-bigrams/

You’ve already identified the seemingly rare phenomenon of rows that lack repeating symbols. There are 9 such rows. In 1,000,000 random shuffles of Z340, none had that many rows. In fact, the best that was found was 8 rows which occurred in only 12 of the shuffles.

Your “M+” asymmetry observation seems to fit in with the general observation that repeating bigrams are phobic of certain regions of the text. The lower left, for instance, seems to hate bigrams: http://zodiackillerciphers.com/images/z340-repeating-bigrams.png

Another really strange observation is the distribution of non-repeating string lengths. For each position of Z340, measure how far you can read forward without encountering a repeating symbol. You end up with a string with unique sequences of length L. Jarlve found that for Z340, there is a peak of 26 occurrences of unique sequences of length 17 (which happens to be the width of Z340). It is really interesting that in random shuffles, this phenomenon is only observed on the order of one in a billion shuffles.

Finally, I would recommend that anyone interested in this topic should check out this thread on morf’s Zodiac forum: http://zodiackillersite.com/viewtopic.php?f=81&t=3196 Especially the more recent posts on the latter pages. “Jarlve” and “smokie” in particular are doing fantastic work exploring various transcription schemes that could explain the various curious features of Z340 (in particular, the relationships between periodic bigrams and transposition schemes).

The 408-symbol-long Zodiac Killer cipher (‘Z408’) was cracked by Donald and Bettye Harden in 1969 while the next 340-symbol-long Zodiac Killer cipher (‘Z340’) arrived not long after: ever since then, there has been a widespread presumption among researchers that the later cipher would just be a more complicated version of the earlier cipher (e.g. perhaps transposed in some way).

The Z340 certainly resembles Z408, insofar as the cipher shapes employed in both were very similar, and that certainly lends support to the widely-held presumption that Z340 uses the same kind of ‘pure’ homophonic cipher system. But is that the whole story? Personally, I’m not so sure…

Unusual Aspects of the Z340

It has long been pointed out that the Z340 cipher sports a number of idiosyncratic features that are not present in the earlier Z408 cipher. For example, the FBI’s Dan Olson pointed out a few years ago that:

* Statistical tests indicate a higher level of randomness by row, than by column. This indicates that the cipher is written horizontally and rules out any transposition patterns that are not strictly horizontal.

* Lines 1-3 and 11-13 contain a distinct higher level of randomness than lines 4-6 and 14-16. This appears to be intentional and indicates that lines 1-3 and 11-13 contain valid ciphertext whereas lines 4-6 and 14-16 may be fake.

* Because of the vertical symmetry of the statistical observations, the message may have been written, then split into two equal size parts and placed top over bottom.

These suggest that something odd might be going on though inside the cipher: in this respect, the Z340 cipher resembles the Voynich Manuscript’s frustrating ‘Voynichese’, which looks straightforward on the surface but which turns out to have many behavioural features which are not seen in other known ciphers.

I’d also add that row #10 starts and ends with ‘-‘, which looks somewhat artificial – though it could just be random, it may also have some kind of meta-significance for the interpretation of the overall cryptogram (e.g. “CUT HERE”).

Finally, I’d add that Z340’s final (20th) line looks very much as if it contains a mangled ZODIAK signature, which – if correct – would probably make sense as 50% crypto padding, and 50% flipping the bird at the FBI. 😉

Anyway, given that the message contains 20 lines of 17 symbols (20 x 17 = 340) and we can see similar artefacts in rows 1-3 and 11-13, then it seems likely to me that there was some kind of major coding break after row #10.

Consequently, I’ve long wondered whether the two halves of Z340 (let’s call them ‘Z170-A’ and ‘Z170-B’) used a different set of cipher-symbol-to-plaintext-letter assignments to each other: in which case, the sensible way to make progress would be to try to solve each half separately. Even so, we would still need to eke out some additional assistance (or meta-assistance) from the texts to make progress, because the odds are so heavily stacked against us.

Yet there’s another feature of the Z340 cipher which struck me a while back but which I haven’t got round to blogging about until now. It’s all to do with doubled shapes, and the story starts with the Z408 ciphertext…

Z408’s doubled letters

To construct Z408, the Zodiac Killer used 7 shapes for { E }, 4 shapes each for { T, A, O, I, N, S }, 3 shapes each for { R, L }, 2 shapes each for { D, F, H }, and 1 shape each for the rest (probably): this yields a grand total of 54-ish cipher shapes to encipher 26 plaintext letters.

Given that the instance count curve for the English alphabet is often described as “ETAOINSHRDLU…”, this tiered arrangement makes sense (as I recall, various researchers have tried to use the homophone allocation to infer which popular cryptography manual the Zodiac Killer specifically relied upon, but I don’t remember if there was a definitive answer to that question).

However, one particular letter caused him a lot of practical problems for Z408: the letter L. Even though this has a relatively small frequency count (compared to, say, the letter E), the particular text he enciphered included numerous ‘LL’ pairs. That is kind of what you get if you want to say the word ‘KILL’ all the time: the words with a double-L are KILLING, KILLING, ALL, KILL, THRILLING, WILL, ALL, KILLED, WILL, WILL, WILL, and COLLECTING.

(As an aside, I’ve often wondered whether the multiple repetitions of the word “WILL” might possibly imply that the Zodiac Killer’s first name was indeed “WILL” / William. The subconscious is a funny ‘wild animal’ in that way.)

Anyway, as a direct result of this, the letter L is used here more often than its normal English stats would suggest: and so the Zodiac Killer had to encipher ‘LL’ 12 times with only three shapes in its tier. To avoid pattern repetitions, he ended up doubling up the enciphered L-shapes a few times, and so the final Z408 ciphertext included a number of doubled L shapes.

The only other doubled letter was ‘G’, which only had a single shape allocated to it, and which appeared doubled only once.

Z340’s Problem With ‘+’

If Z340 (which uses 63 distinct shapes) uses a similar kind of homophonic cipher to the one used in the Z408 cipher (which uses 54 distinct shapes), then I would say it has a very specific problem with whatever is being enciphered by the shape ‘+’.

‘+’ occurs 24 times (7% of the total number of characters, and exactly double that of ‘B’, the second most frequent shape), which by itself largely makes a nonsense of the suggestion that Z340 is a homophonic cipher: anything with that high a frequency count should surely have a whole set of homophones to represent it.

You might wonder whther ‘+’ enciphers a frequent word or syllable, such as ‘THE’ or ‘ING’. However, it appears three times immediately doubled with itself, i.e. ‘++’ (the only other letter that appears doubled is the sequence ‘pp’ that occurs once near the start of row #4).

Even if, as Dave Oranchak did, you do a brute force search for homophone cycles (don’t get me started on what they are, or we’ll be here all night), you don’t find anything that accounts for Z340’s ‘+’ shape.

And yet, as Dave Oranchak points out, Z340 has some strong-looking homophone cycles, such as [l*M] [l*M] [l*M] lM [l*M] [l*M] [l*M], which would seem to imply that Z340 is at heart a homophonic cipher. There are plenty of other measures (many noted by my late friend Glen Claston) that point in the same direction,

Moreover, because the number of shapes used is greater than for the Z408 cipher, you would naturally expect to see more tiers or wider tiers (though 7 shapes for E was already quite a wide tier). So you would naturally expect to see a consequent flattening of the statistics. And yet ‘+’ bucks that trend completely.

How Can We Reconcile These Two?

As a starting point, you might note that ‘M+’ occurs three times in the top half, but not at all in the bottom half. In fact, M is always followed by + in the top half, and never followed by + in the bottom half (where it occurs four times).

It seems to me that the ‘+’ shape makes the top half (Z170-A) easy and the bottom half (Z-170-B) difficult all at the same time. And that’s not something that I personally can comfortably reconcile with the kind of one-size-fits-all pure homophonic solutions most people seem to be looking for, even with confounding transposition stages thrown in: the behaviour of the ‘M+’ pattern would seem to point away from almost all of the transposition variants previously proposed.

Having really, really thought about it, my tentative conclusion is that ‘+’ seems to operate more as a kind of meta-token rather than as a pure token. I mean this in the same general way that certain Voynichese letters seem to me to encipher 15th century shorthand tokens (‘contractio’, etc).

A Suggestion

As I recall, the Hardens found their crib by guessing that the first letter was “I” and then looking for the word “KILL”: the ease of which doubtless made an already angry psychopath even angrier than he already was.

Hence to my mind, the thing he would most likely have been looking to solve when moving from his Z408 cipher system to his Z340 cipher system was how to make that new system impervious to that specific kind of an attack. And the key letter that let him down first time round was the letter ‘L’, specifically in its doubled form.

Consequently, I propose that this was the single technical challenge that spurred the internal changes from Z408 to Z340. And there was one obvious – but admittedly very old-fashioned – trick that he could have used to make doubled letters harder to see.

So here’s my suggestion. Could it be that the Zodiac Killer used ‘+’ as a meta-token to mean “REPEAT THE LAST LETTER“? (‘++’ would then mean a tripled letter, or perhaps something else entirely).

If that’s correct, I would further expect that ‘M’ was one of the homophones for ‘L’, and the [l*M] cycle could very well have been the 3-long homophone loop for ‘L’.

Do you really believe that the Zodiac could write a taunting message to the police without using the four-letter sequence “KILL”? In many ways, that was kind of the whole point.

A few days ago, I had a nice email from two Swedish engineers called Henrik (Henke) Sundberg and David Thelin: surprisingly, they claimed that they had worked out the details of the Zodiac Killer’s 32-character “map cipher” (also known as “Z32”).

The first thing I did was to put up a new page describing the Z32 cipher, something I’ve been meaning to do for a few years: as normal, I tried to cover the raw factuality and basic observations rather than out-and-out theories and speculation.

The short version is that the letter-shapes in the Z32 cipher look nearly exactly the same as the (famously solved) Z408 cipher, which makes it seem very much as though it too is a homophonic cipher, though with different letter assignments (deciphering it using the Z408 key doesn’t seem to yield anything sensible). Unfortunately, 32 characters (made up of 29 different shapes, i.e. only three appear more than once) wouldn’t normally be anywhere near large enough for a homophonic cryptogram to be cracked, unless you had some significant additional information to work with. (Hint: a cipher key would be a good start. 🙂 )

However, in this case there was some other extra information: a roadmap of the San Francisco Bay Area with a “Zodiac Killer” shape centred on Mount Diablo, and a note saying “The Map coupled with this code will tell you where the bomb is set. You have untill next Fall to dig it up“. A second “little list” letter (posted a month later) give a further clue: “PS. The Mt. Diablo code concerns Radians 4#inches along the radians“.

Sundberg and Thelin’s theory (described in this PDF file) is that it’s in fact a very scientific cipher, as much a stegotext as a cryptogram.

Z32-cipher

From this, they extract the phrases “C3H3”, “Octane”, and “North of West”, while “HCEL(Zodiac)PW(triangle)” reminds them of how the molecule HClO3 looks, centred around the Zodiac symbol. From which they deduce that they need to look 1 inch (i.e. 6.4 miles) along a vector due West from magnetic North.

Guys, guys… I’m really sorry, but I think you’ve got it wrong. Nobody in their right (or indeed wrong) mind would concoct a chain of reasoning based around a vague resemblance to a particular molecule in order to encode a unit vector. Even dear old Jessica Lee wouldn’t do that, much as she likes chemistry and ciphers.

Look: the Zodiac Killer wasn’t some evil scientific genius, he was a sick, unhappy man with a grudge against the SFPD (probably a surrogate for his sick unhappy relationship with his abusive, distant father) on a gun-powered external power trip, a (literally) vain attempt to right the perceived wrongs in his personal life. I don’t even think he knew properly what a “radian” is, because he doesn’t use the term correctly in his note.

If you happen to like both technical-minded heavy metal and cipher mysteries, I might possibly have a hot tip for you (thanks to Phil Strahl’s blog). Otto Kinzel has released an album on Bluntface Records called “I want to report a murder”, where every track is based on features or events in the Zodiac Killer cipher case.

Kinzel has even done a 7-minute video of the title track “I want to report a murder”: but you might want to to skip past the first atmospheric 1 minute 28 secs of video intro and get with the metal…

PS: if you’re reading this as an email & the video didn’t get embedded, here’s the direct link to the video on YouTube. Enjoy!

Heteroscedasticity – now there’s a word you don’t see very often (thanks to Rosco Paterson for kindly plonking it in my path). Which is a pity, because it’s a particularly useful concept that might help us crack several longstanding cipher mysteries.

The idea behind it is not too far from the old joke about the statistician with his feet in the oven and his head in the fridge, who – on average – felt very comfortable. A set of numbers is heteroscedastic if it simultaneously contains different (‘hetero-’) subgroups such that (for example) their average value falls between the groups. As a result, looking to that average for enlightenment as to the nature of those two separate subgroups is probably not going to do you much good.

Perhaps unsurprisingly, it turns out that a lot of statistical properties implicitly rely on the data to be analyzed not having this property. That is, for data with multiple modes or states, the consequent heteroscedasticity is likely to mess up your statistical reasoning. Though you’ll still get plausible-looking results, there’s a high chance they’ll be of no practical use. So for cipher systems in general, any hint of multimodality should be a heteroscedastic alarm bell, a warning that your statistical toolbox may be as much use as a wet fish for tightening a bolt.

Plenty of Voynich Manuscript (‘VMs’) researchers will be sagely nodding their heads at this point, because they know all too well that the plethora of statistical analyses performed so far on it has failed to yield much of consequence. Could this be because its ‘Voynichese’ text heteroscedastically ‘hops’ between states? Cipher Mysteries regulars will know I’ve long suspected there’s some kind of state machine at play, but I’ve yet to see any full-on analysis of the VMs with this in mind.

Historically, the first proper ciphering state machine was Alberti’s 1465 cipher disk. He placed one alphabet on a stator (a static disk) and another on a rotor (a rotating disk), rotating the latter according to some system pre-agreed between encipherer and decipherer, e.g. rotating it after every couple of words, or after every vowel, etc.

Even if you don’t happen to buy in to my Averlino hypothesis (but don’t worry if you don’t, it’s not mandatory here), 1465 isn’t hugely far from the Voynich Manuscript’s vellum radiocarbon dating. It could well be that state machine cryptography was in the air: perhaps Alberti was building on an earlier, more experimental cipher he had heard of, but with an overtly Florentine, Brunelleschian clockwork gadget twist.

As an aside, there are plenty of intellectual historians who have suggested that the roots of Alberti’s cipher disk lie (for example) in Ramon Llull’s circular diagrams and conceptual machines: in a way, one might argue that all Alberti did was collide Llull’s stuff with the more hands-on Quattrocento Florentine machine-building tradition, and say “Ta-da!” 🙂

All the same, we do know that the Voynich Manuscript’s cipher is not an Albertian polyalphabetic cipher: but if it is multimodal, how should we look for evidence of it?

A few years ago when my friend Glen Claston was laboriously making his own transcription of the VMs, he loosely noticed that certain groups of symbols and even words seemed to phase in and out, as if there was a higher-level structure underlying its text. Was he glimpsing raw heteroscedasticity, arising from some kind of state machine clustering? For now this is just his cryptological instinct, not a rigorous proof: and it is entirely true he may have been influenced by the structure of Leonell Strong’s claimed decryption (which introduced a new cipher alphabet every few lines). Despite all that, I’m happy to take his observation at face value: and that Voynichese may well be built around a higher-level internal state structure that readily confounds our statistical cryptanalyses.

So, the big question here is whether it is possible to design tests to explicitly detect multimodality ‘blind’. The problem is that even though this is done a lot in econometrics (there was even a Nobel Prize for Economics awarded for work to do with heteroscedasticity), economic time series are surely quite a different kettle of monkeys to ciphertexts. Perhaps there’s a whole cryptanalytical literature on detecting heteroscedasticity, please leave a comment here if you happen to know of this!

I don’t know what the answer to all this is: it’s something I’ve been thinking about for a while, without really being able to resolve to my own satisfaction. Make of it what you will!

At the same time, there’s also a spooky echo with the Zodiac Killer’s Z340 cipher here. I recently wrote some code to test for the presence of homophone cycles in Z340, and from the results I got I strongly suspect that its top half employs quite a different cipher to the bottom – the homophone cycles my code suggested for the two halves were extremely different.

Hence it could well be that most statistical analyses of Z340 done to date have failed to produce useful results because of the confoundingly heteroscedastic shadow cast by merging (for example) two distinct halves into a single ciphertext. How could we definitively test whether Z340 is formed of two halves? Something else to think about! 🙂