In a recent post here, I floated the idea that the Zodiac Killer’s Z408 (solved) cipher’s unusual homophone distribution may have arisen not conceptually (i.e. from a hitherto-unknown book on cryptography), but instead empirically (i.e. emerging from the properties of a specific text).

It’s certainly possible that he might have used his own (private) text to model his homophone distribution, in which case we probably almost no chance of reconstructing it. However, I think it likely that he instead used the first few characters of an already existing public text (such as Moby Dick, the Book of Genesis, the Declaration of Independence, or whatever) to do this.

It’s a reasonable enough suggestion, I think: and moreover one that we can try to test to a reasonable degree.

Z408’s homophones

A homophonic cipher key allocates a number of cipher shapes to individual plaintext letters, usually (but not always) in broad proportion to their frequency. So in a typical homophonic cipher key you would expect to see far more shapes for E (the most common letter in English) than for, say, Z or Q.

Though this is essentially the case for what we see in the Z408 cipher (particularly for the more frequent letters, ETAOINS), the numbers of homophones chosen for the less frequent letters seem somewhat idiosyncratic and arbitrary:

7 shapes – E
4 shapes – T A O I N S
3 shapes – L R
2 shapes – D F H
1 shape  – B C G K M P U V W X Y
Did not appear: J Q Z

People have long searched for a primer or textbook on cryptography where the description of the alphabetic frequency distribution matches this, or even where the alphabetic frequency ordering (e.g. ETAOINSHRDLU etc) matches the order here, but in vain.

Designing a filter

The basic idea for the filter is easy enough:
* read in characters from the start of a passage (we’re only interested in capitalized alphabetic letters, i.e. A-Z)
* if the instance count of that character is higher than the top of the desired range, then the test fails
* if the instance counts for all the characters are within the desired range at the same time, then the test passes
* else keep reading in more characters until the test terminates

As a side note: of all the Z408 homophones, only X appears exactly once in the Z408 ciphertext itself: but while it is conceivable that the Zodiac Killer might have allocated extra homophones for X, it does seem fairly unlikely.

The desired ranges for each of the characters would look like this (though feel free to adapt this if you disagree with the homophone counts listed above):

[7,7] – E
[4,4] – T A O I N S
[3,3] – L R
[2,2] – D F H
[0,1] – B C G K M P U V W Y J Q Z
[0,3] – X (to err on the side of safety)

Note that the single-letter characters have a slightly broader [0,1] range because we have no way of knowing whether or not they would have actually appeared in the original text.

Here are two test texts that should both pass:

EEEEEEETTTTAAAAOOOOIIIINNNNSSSSLLLRRRDDFFHHZZZZZZZZZZZZZZZZZ

BCGKMPUVWYJQZXEEEEEEETTTTAAAAOOOOIIIINNNNSSSSLLLRRRDDFFHHZZZ

Which texts to try?

Though any text published before August 1969 would potentially be a match, it would make sense to look at all manner of texts, and possibly even the first few lines of different chapters of books (though I’d be a little surprised if that was the case). All the same, the filter is easy enough to write (and should execute in a matter of microseconds) and to test, so the difficulty here lies mostly in getting hold of enough texts to try, rather than the compute time as such.

Oddly, I don’t really have a solid feel for how often the filter will find a match: my gut instinct is that roughly one in a million English text comparisons will pass, but that’s just a guesstimate based on each letter having its own little bell-curve distribution, all of which have to match at the same time.

So what do you think will match? “Catcher in the Rye” or “Moby Dick”? Place your bets! 😉

Given that the Zodiac Killer’s first big cipher (the Z408) got cracked so quickly, it shouldn’t really be a surprise that he used a slightly different system for his second big cipher (the Z340). What is (arguably) surprising is that whatever change he made to it has not been figured out since then.

But what was he thinking? What did he want from a cipher? And how might his needs have changed between Z408 and Z340?

The Z408

Ciphers are normally made to be as strong as practically possible, given the technological, time, and resource constraints that apply to both sender and receiver: and with the two main driving needs being privacy and secrecy. Note that these aren’t always the same thing: the way I usually describe it is that while sex with your husband is private, sex with your tennis coach is secret. 😉

And so the first thing I find cryptographically interesting about the Zodiac Killer is that he was creating a cipher from a slightly angle from either of these: and he certainly wasn’t trying to communicate in any normal sense of the word.

Rather, I think that the point of Z408 was to be taunting, and to demonstrate to the police that he was in control, not them.

So imagine the Zodiac’s probable fury, then, when little more than a week after his three Z408 cryptograms appeared in local newspapers (the Vallejo Times-Chronicle, the San Francisco Examiner and the San Francisco Chronicle), Donald and Bettye Harden were all over the front pages explaining how they had cracked them.

Didn’t they know who was supposed to be in control here?

What was worse, the Hardens hadn’t used cryptological hardware or even high-powered cryptological smarts. They’d just used the Zodiac’s egoism (they guessed the first letter was “I”) and his psychopathic bragging (they guessed he would use the word KILL multiple times) as keys to his cryptographic front door: and then marched straight in.

I think it’s fairly safe to expect that the Zodiac was pretty pissed off by this.

Note that the Hardens carried on trying to crack the Z340 for many years afterwards: according to their daughter, her “mother wrote poetry and was as absorbed in her writing as she became with the Zodiac codes. She worked on the second code on and off for the rest of her life.

The Z340

Comparing the overall style of the Z340 with that of the Z408, there seems to be plenty of reasons to think that the two are, at heart, not wildly different from each other. And yet (as is widely known) all the big-brained homophonic solvers written since haven’t made any impact on the Z340 at all.

All the same, I think the second interesting thing to note is that the changes to the Z340 system were surely not made to defend against computer-assisted codebreaking (because that hadn’t yet happened), but rather to make the updated system Harden-hardened, so to speak.

What does this mean? Well, we can probably infer that the first letter of the Z340 is almost certainly not I (not that that helps us a great deal) and the Zodiac Killer must have done something to conceal or remove the KILL weakness.

But, in my opinion, that latter change would surely not have been a theoretically-motivated cryptographic adaptation (he was without much doubt an amateur cryptographer), but rather something pragmatic and empirical, perhaps along the lines of:
* adding a repeat-the-last-letter token
* add an LL token
* add an ILL token
* add nulls inside tell-tale words
* etc

But there’s a problem with all of these. In fact, there are several problems. 🙁

The Problems

The first problem is that I don’t currently believe any of the above changes are disruptive enough to explain what we see in the Z340.

The basic stats of the four main Zxxx ciphers are:
Z408: 408 symbols, from a set of 54 unique symbols. (Note: E has 7 homophones, AST have 6 each, IO have 5 each, N has 4, FLR have 3 each, DHW have 2 each, everything else has 1).
Z340: 340 symbols, from a set of 63. [Hence symbols/textsize is 18.5%, a fair bit higher than the Z408’s 13.3%]
Z32: 32 symbols, from a set of 30.
Z13: 13 symbols, from a set of 8.

It would be very tempting to suspect (as many people have) that the Z340 is ‘therefore’ just the same as Z408 but with 39% more homophones. Yet a problem with this popular hypothesis is that it should be well within range of automated homophone solvers, and to date they haven’t managed to make any impact.

A second problem is that the kind of homophone cycles that so characterized the Z408 seem to be largely absent in the Z340: and yet because the Zodiac Killer would not have had any clue that these were a technical weakness of his system, it seems unlikely to me that he would have adjusted his system to work around a weakness that he didn’t actually know was a weakness.

A third problem is that the Z340 has a fair number of asymmetries that don’t fit the it’s-a-straight-homophonic-cipher model. For example, lines 1-3 and 11-13 have (as Dan Olson pointed out some years ago) almost no character repeats.

There are yet other asymmetries: for example, while 63 different symbols appear in the top ten lines, only 60 appear in the bottom ten lines. And there’s the mysterious ‘-‘ shape at the start and end of line 10: and the odd-looking “ZODAIK” sequence on line 20.

One final asymmetry: the ‘+’ shape seems to function differently in the top and bottom halves – it is often preceded by ‘M’ in the top half, but never preceded by ‘M’ in the bottom half.

How does assuming the Z340 is a pure homophonic cipher explain any of these behaviours, let alone all of them?

Lines 1-3 and 11-13, revisited

I keep coming back to the 1-3 and 11-13 property as mentioned here. I think it’s important to say that Dan Olson’s conclusion (that “lines 1-3 and 11-13 contain valid ciphertext whereas lines 4-6 and 14-16 may be fake”) seems likely to be landing a little bit wide of the mark.

To me, this same property of these lines implies (a) that the homophonic versions for each letter were probably used in pure sequence here, but also (b) the homophone cycles were somehow ‘reset’ after ten lines (i.e. the homophone cycles all started again at the start of line eleven). And perhaps also that any characters repeated in the first three lines are rarer characters, rather than the homophone-friendly ETAOINSHRDLU etc.

It might even be that the Zodiac Killer kept on adding homophones as he constructed the cipher UNTIL he had three lines’ worth of essentially unique homophones: that is to say, that the three line blocks in 1-3 and 11-13 are how his system made the choice of the number of homophones, rather than as a consequence of the number of homophones he chose. Nobody has yet (to my knowledge) satisfactorily explained where he came up with his homophonic allocation for Z408: certainly, searching for this in crypto books hasn’t yielded any likely candidates.

Could it be that the Zodiac Killer worked backwards from his actual Z408 ciphertext to determine the number of homophones, rather than worked forward from the number of homophones to the ciphertext?

Update: I received the following off-line comment from David Oranchak, but thought it better to update it within the post itself…

Nick, there are a few other seemingly rare phenomena that can be observed in Z340. I’m curious what you think of them.

The first is the pivots:

http://zodiackillerciphers.com/wiki/index.php?title=Encyclopedia_of_observations#The_.22Pivots.22

Those kinds of patterns are difficult to arise by chance, so they are suspected to be some sort of feature of the encoding scheme.

Z408 is littered with repeating bigrams but Z340 seems to have fewer than would be expected via normal homophonic encipherment of a plaintext in a normal reading direction. However, the bigrams show up again if you consider a periodic operation on the cipher text:

http://zodiackillerciphers.com/wiki/index.php?title=Encyclopedia_of_observations#Periodic_ngram_bias

The count of 25 repeating bigrams jumps to 37 or 41 or even higher, depending on the periodic operation applied to the cipher text. Here is a tool that illustrates the various operations:

http://zodiackillerciphers.com/period-19-bigrams/

You’ve already identified the seemingly rare phenomenon of rows that lack repeating symbols. There are 9 such rows. In 1,000,000 random shuffles of Z340, none had that many rows. In fact, the best that was found was 8 rows which occurred in only 12 of the shuffles.

Your “M+” asymmetry observation seems to fit in with the general observation that repeating bigrams are phobic of certain regions of the text. The lower left, for instance, seems to hate bigrams: http://zodiackillerciphers.com/images/z340-repeating-bigrams.png

Another really strange observation is the distribution of non-repeating string lengths. For each position of Z340, measure how far you can read forward without encountering a repeating symbol. You end up with a string with unique sequences of length L. Jarlve found that for Z340, there is a peak of 26 occurrences of unique sequences of length 17 (which happens to be the width of Z340). It is really interesting that in random shuffles, this phenomenon is only observed on the order of one in a billion shuffles.

Finally, I would recommend that anyone interested in this topic should check out this thread on morf’s Zodiac forum: http://zodiackillersite.com/viewtopic.php?f=81&t=3196 Especially the more recent posts on the latter pages. “Jarlve” and “smokie” in particular are doing fantastic work exploring various transcription schemes that could explain the various curious features of Z340 (in particular, the relationships between periodic bigrams and transposition schemes).

The 408-symbol-long Zodiac Killer cipher (‘Z408’) was cracked by Donald and Bettye Harden in 1969 while the next 340-symbol-long Zodiac Killer cipher (‘Z340’) arrived not long after: ever since then, there has been a widespread presumption among researchers that the later cipher would just be a more complicated version of the earlier cipher (e.g. perhaps transposed in some way).

The Z340 certainly resembles Z408, insofar as the cipher shapes employed in both were very similar, and that certainly lends support to the widely-held presumption that Z340 uses the same kind of ‘pure’ homophonic cipher system. But is that the whole story? Personally, I’m not so sure…

Unusual Aspects of the Z340

It has long been pointed out that the Z340 cipher sports a number of idiosyncratic features that are not present in the earlier Z408 cipher. For example, the FBI’s Dan Olson pointed out a few years ago that:

* Statistical tests indicate a higher level of randomness by row, than by column. This indicates that the cipher is written horizontally and rules out any transposition patterns that are not strictly horizontal.

* Lines 1-3 and 11-13 contain a distinct higher level of randomness than lines 4-6 and 14-16. This appears to be intentional and indicates that lines 1-3 and 11-13 contain valid ciphertext whereas lines 4-6 and 14-16 may be fake.

* Because of the vertical symmetry of the statistical observations, the message may have been written, then split into two equal size parts and placed top over bottom.

These suggest that something odd might be going on though inside the cipher: in this respect, the Z340 cipher resembles the Voynich Manuscript’s frustrating ‘Voynichese’, which looks straightforward on the surface but which turns out to have many behavioural features which are not seen in other known ciphers.

I’d also add that row #10 starts and ends with ‘-‘, which looks somewhat artificial – though it could just be random, it may also have some kind of meta-significance for the interpretation of the overall cryptogram (e.g. “CUT HERE”).

Finally, I’d add that Z340’s final (20th) line looks very much as if it contains a mangled ZODIAK signature, which – if correct – would probably make sense as 50% crypto padding, and 50% flipping the bird at the FBI. 😉

Anyway, given that the message contains 20 lines of 17 symbols (20 x 17 = 340) and we can see similar artefacts in rows 1-3 and 11-13, then it seems likely to me that there was some kind of major coding break after row #10.

Consequently, I’ve long wondered whether the two halves of Z340 (let’s call them ‘Z170-A’ and ‘Z170-B’) used a different set of cipher-symbol-to-plaintext-letter assignments to each other: in which case, the sensible way to make progress would be to try to solve each half separately. Even so, we would still need to eke out some additional assistance (or meta-assistance) from the texts to make progress, because the odds are so heavily stacked against us.

Yet there’s another feature of the Z340 cipher which struck me a while back but which I haven’t got round to blogging about until now. It’s all to do with doubled shapes, and the story starts with the Z408 ciphertext…

Z408’s doubled letters

To construct Z408, the Zodiac Killer used 7 shapes for { E }, 4 shapes each for { T, A, O, I, N, S }, 3 shapes each for { R, L }, 2 shapes each for { D, F, H }, and 1 shape each for the rest (probably): this yields a grand total of 54-ish cipher shapes to encipher 26 plaintext letters.

Given that the instance count curve for the English alphabet is often described as “ETAOINSHRDLU…”, this tiered arrangement makes sense (as I recall, various researchers have tried to use the homophone allocation to infer which popular cryptography manual the Zodiac Killer specifically relied upon, but I don’t remember if there was a definitive answer to that question).

However, one particular letter caused him a lot of practical problems for Z408: the letter L. Even though this has a relatively small frequency count (compared to, say, the letter E), the particular text he enciphered included numerous ‘LL’ pairs. That is kind of what you get if you want to say the word ‘KILL’ all the time: the words with a double-L are KILLING, KILLING, ALL, KILL, THRILLING, WILL, ALL, KILLED, WILL, WILL, WILL, and COLLECTING.

(As an aside, I’ve often wondered whether the multiple repetitions of the word “WILL” might possibly imply that the Zodiac Killer’s first name was indeed “WILL” / William. The subconscious is a funny ‘wild animal’ in that way.)

Anyway, as a direct result of this, the letter L is used here more often than its normal English stats would suggest: and so the Zodiac Killer had to encipher ‘LL’ 12 times with only three shapes in its tier. To avoid pattern repetitions, he ended up doubling up the enciphered L-shapes a few times, and so the final Z408 ciphertext included a number of doubled L shapes.

The only other doubled letter was ‘G’, which only had a single shape allocated to it, and which appeared doubled only once.

Z340’s Problem With ‘+’

If Z340 (which uses 63 distinct shapes) uses a similar kind of homophonic cipher to the one used in the Z408 cipher (which uses 54 distinct shapes), then I would say it has a very specific problem with whatever is being enciphered by the shape ‘+’.

‘+’ occurs 24 times (7% of the total number of characters, and exactly double that of ‘B’, the second most frequent shape), which by itself largely makes a nonsense of the suggestion that Z340 is a homophonic cipher: anything with that high a frequency count should surely have a whole set of homophones to represent it.

You might wonder whther ‘+’ enciphers a frequent word or syllable, such as ‘THE’ or ‘ING’. However, it appears three times immediately doubled with itself, i.e. ‘++’ (the only other letter that appears doubled is the sequence ‘pp’ that occurs once near the start of row #4).

Even if, as Dave Oranchak did, you do a brute force search for homophone cycles (don’t get me started on what they are, or we’ll be here all night), you don’t find anything that accounts for Z340’s ‘+’ shape.

And yet, as Dave Oranchak points out, Z340 has some strong-looking homophone cycles, such as [l*M] [l*M] [l*M] lM [l*M] [l*M] [l*M], which would seem to imply that Z340 is at heart a homophonic cipher. There are plenty of other measures (many noted by my late friend Glen Claston) that point in the same direction,

Moreover, because the number of shapes used is greater than for the Z408 cipher, you would naturally expect to see more tiers or wider tiers (though 7 shapes for E was already quite a wide tier). So you would naturally expect to see a consequent flattening of the statistics. And yet ‘+’ bucks that trend completely.

How Can We Reconcile These Two?

As a starting point, you might note that ‘M+’ occurs three times in the top half, but not at all in the bottom half. In fact, M is always followed by + in the top half, and never followed by + in the bottom half (where it occurs four times).

It seems to me that the ‘+’ shape makes the top half (Z170-A) easy and the bottom half (Z-170-B) difficult all at the same time. And that’s not something that I personally can comfortably reconcile with the kind of one-size-fits-all pure homophonic solutions most people seem to be looking for, even with confounding transposition stages thrown in: the behaviour of the ‘M+’ pattern would seem to point away from almost all of the transposition variants previously proposed.

Having really, really thought about it, my tentative conclusion is that ‘+’ seems to operate more as a kind of meta-token rather than as a pure token. I mean this in the same general way that certain Voynichese letters seem to me to encipher 15th century shorthand tokens (‘contractio’, etc).

A Suggestion

As I recall, the Hardens found their crib by guessing that the first letter was “I” and then looking for the word “KILL”: the ease of which doubtless made an already angry psychopath even angrier than he already was.

Hence to my mind, the thing he would most likely have been looking to solve when moving from his Z408 cipher system to his Z340 cipher system was how to make that new system impervious to that specific kind of an attack. And the key letter that let him down first time round was the letter ‘L’, specifically in its doubled form.

Consequently, I propose that this was the single technical challenge that spurred the internal changes from Z408 to Z340. And there was one obvious – but admittedly very old-fashioned – trick that he could have used to make doubled letters harder to see.

So here’s my suggestion. Could it be that the Zodiac Killer used ‘+’ as a meta-token to mean “REPEAT THE LAST LETTER“? (‘++’ would then mean a tripled letter, or perhaps something else entirely).

If that’s correct, I would further expect that ‘M’ was one of the homophones for ‘L’, and the [l*M] cycle could very well have been the 3-long homophone loop for ‘L’.

Do you really believe that the Zodiac could write a taunting message to the police without using the four-letter sequence “KILL”? In many ways, that was kind of the whole point.

A few days ago, I had a nice email from two Swedish engineers called Henrik (Henke) Sundberg and David Thelin: surprisingly, they claimed that they had worked out the details of the Zodiac Killer’s 32-character “map cipher” (also known as “Z32”).

The first thing I did was to put up a new page describing the Z32 cipher, something I’ve been meaning to do for a few years: as normal, I tried to cover the raw factuality and basic observations rather than out-and-out theories and speculation.

The short version is that the letter-shapes in the Z32 cipher look nearly exactly the same as the (famously solved) Z408 cipher, which makes it seem very much as though it too is a homophonic cipher, though with different letter assignments (deciphering it using the Z408 key doesn’t seem to yield anything sensible). Unfortunately, 32 characters (made up of 29 different shapes, i.e. only three appear more than once) wouldn’t normally be anywhere near large enough for a homophonic cryptogram to be cracked, unless you had some significant additional information to work with. (Hint: a cipher key would be a good start. 🙂 )

However, in this case there was some other extra information: a roadmap of the San Francisco Bay Area with a “Zodiac Killer” shape centred on Mount Diablo, and a note saying “The Map coupled with this code will tell you where the bomb is set. You have untill next Fall to dig it up“. A second “little list” letter (posted a month later) give a further clue: “PS. The Mt. Diablo code concerns Radians 4#inches along the radians“.

Sundberg and Thelin’s theory (described in this PDF file) is that it’s in fact a very scientific cipher, as much a stegotext as a cryptogram.

Z32-cipher

From this, they extract the phrases “C3H3”, “Octane”, and “North of West”, while “HCEL(Zodiac)PW(triangle)” reminds them of how the molecule HClO3 looks, centred around the Zodiac symbol. From which they deduce that they need to look 1 inch (i.e. 6.4 miles) along a vector due West from magnetic North.

Guys, guys… I’m really sorry, but I think you’ve got it wrong. Nobody in their right (or indeed wrong) mind would concoct a chain of reasoning based around a vague resemblance to a particular molecule in order to encode a unit vector. Even dear old Jessica Lee wouldn’t do that, much as she likes chemistry and ciphers.

Look: the Zodiac Killer wasn’t some evil scientific genius, he was a sick, unhappy man with a grudge against the SFPD (probably a surrogate for his sick unhappy relationship with his abusive, distant father) on a gun-powered external power trip, a (literally) vain attempt to right the perceived wrongs in his personal life. I don’t even think he knew properly what a “radian” is, because he doesn’t use the term correctly in his note.

If you happen to like both technical-minded heavy metal and cipher mysteries, I might possibly have a hot tip for you (thanks to Phil Strahl’s blog). Otto Kinzel has released an album on Bluntface Records called “I want to report a murder”, where every track is based on features or events in the Zodiac Killer cipher case.

Kinzel has even done a 7-minute video of the title track “I want to report a murder”: but you might want to to skip past the first atmospheric 1 minute 28 secs of video intro and get with the metal…

PS: if you’re reading this as an email & the video didn’t get embedded, here’s the direct link to the video on YouTube. Enjoy!

Heteroscedasticity – now there’s a word you don’t see very often (thanks to Rosco Paterson for kindly plonking it in my path). Which is a pity, because it’s a particularly useful concept that might help us crack several longstanding cipher mysteries.

The idea behind it is not too far from the old joke about the statistician with his feet in the oven and his head in the fridge, who – on average – felt very comfortable. A set of numbers is heteroscedastic if it simultaneously contains different (‘hetero-’) subgroups such that (for example) their average value falls between the groups. As a result, looking to that average for enlightenment as to the nature of those two separate subgroups is probably not going to do you much good.

Perhaps unsurprisingly, it turns out that a lot of statistical properties implicitly rely on the data to be analyzed not having this property. That is, for data with multiple modes or states, the consequent heteroscedasticity is likely to mess up your statistical reasoning. Though you’ll still get plausible-looking results, there’s a high chance they’ll be of no practical use. So for cipher systems in general, any hint of multimodality should be a heteroscedastic alarm bell, a warning that your statistical toolbox may be as much use as a wet fish for tightening a bolt.

Plenty of Voynich Manuscript (‘VMs’) researchers will be sagely nodding their heads at this point, because they know all too well that the plethora of statistical analyses performed so far on it has failed to yield much of consequence. Could this be because its ‘Voynichese’ text heteroscedastically ‘hops’ between states? Cipher Mysteries regulars will know I’ve long suspected there’s some kind of state machine at play, but I’ve yet to see any full-on analysis of the VMs with this in mind.

Historically, the first proper ciphering state machine was Alberti’s 1465 cipher disk. He placed one alphabet on a stator (a static disk) and another on a rotor (a rotating disk), rotating the latter according to some system pre-agreed between encipherer and decipherer, e.g. rotating it after every couple of words, or after every vowel, etc.

Even if you don’t happen to buy in to my Averlino hypothesis (but don’t worry if you don’t, it’s not mandatory here), 1465 isn’t hugely far from the Voynich Manuscript’s vellum radiocarbon dating. It could well be that state machine cryptography was in the air: perhaps Alberti was building on an earlier, more experimental cipher he had heard of, but with an overtly Florentine, Brunelleschian clockwork gadget twist.

As an aside, there are plenty of intellectual historians who have suggested that the roots of Alberti’s cipher disk lie (for example) in Ramon Llull’s circular diagrams and conceptual machines: in a way, one might argue that all Alberti did was collide Llull’s stuff with the more hands-on Quattrocento Florentine machine-building tradition, and say “Ta-da!” 🙂

All the same, we do know that the Voynich Manuscript’s cipher is not an Albertian polyalphabetic cipher: but if it is multimodal, how should we look for evidence of it?

A few years ago when my friend Glen Claston was laboriously making his own transcription of the VMs, he loosely noticed that certain groups of symbols and even words seemed to phase in and out, as if there was a higher-level structure underlying its text. Was he glimpsing raw heteroscedasticity, arising from some kind of state machine clustering? For now this is just his cryptological instinct, not a rigorous proof: and it is entirely true he may have been influenced by the structure of Leonell Strong’s claimed decryption (which introduced a new cipher alphabet every few lines). Despite all that, I’m happy to take his observation at face value: and that Voynichese may well be built around a higher-level internal state structure that readily confounds our statistical cryptanalyses.

So, the big question here is whether it is possible to design tests to explicitly detect multimodality ‘blind’. The problem is that even though this is done a lot in econometrics (there was even a Nobel Prize for Economics awarded for work to do with heteroscedasticity), economic time series are surely quite a different kettle of monkeys to ciphertexts. Perhaps there’s a whole cryptanalytical literature on detecting heteroscedasticity, please leave a comment here if you happen to know of this!

I don’t know what the answer to all this is: it’s something I’ve been thinking about for a while, without really being able to resolve to my own satisfaction. Make of it what you will!

At the same time, there’s also a spooky echo with the Zodiac Killer’s Z340 cipher here. I recently wrote some code to test for the presence of homophone cycles in Z340, and from the results I got I strongly suspect that its top half employs quite a different cipher to the bottom – the homophone cycles my code suggested for the two halves were extremely different.

Hence it could well be that most statistical analyses of Z340 done to date have failed to produce useful results because of the confoundingly heteroscedastic shadow cast by merging (for example) two distinct halves into a single ciphertext. How could we definitively test whether Z340 is formed of two halves? Something else to think about! 🙂

One of the nice things about the unsolved Z340 Zodiac Killer cipher is that we have a previous solved cipher by the same encipherer (i.e. the Z408 cipher), which appears to exhibit many of the same properties as the Z340. Hence, if we could forensically reconstruct how Z408 was constructed (i.e. its cryptographic methodology), we might also gain valuable insights into how the later Z340 was constructed.

One interesting feature of the (solved) Z408 is that even though it is a homophonic substitution cipher (which is to say that several different shapes are used for various plaintext letters), the shape selection is often far from random. In fact, in quite a few instances Z408 shapes appear in a strict cycle, which has led to some recent attempts to crack Z340 by trying (unsuccessfully) to infer homophone cycles.

Curiously, one of the shapes (filled triangle) appears to encipher both A and S: and if you extract all these out, a homophone-cycle-like ASASASAS sequence appears. This intrigued me, so I decided to look at it a little closer: might this somehow be a second layer of cycling?

The answer (I’m now pretty sure) turns out to be no, though it’s still interesting in its own right. Basically, the Zodiac seems to have got confused between dotted triangle (for S) and filled triangle (for A), which caused his cycles to break down. He also miscopied an F-shape as an E-shape: perhaps his working draft wasn’t quite as neat as his final copy, and/or written in felt tip, causing letter shapes to soak into the paper and become slightly less distinct.

If we correct these mistakes and reconstruct what he seems to have intended, we see that he was following a fairly strict cycle most of the time, though getting less ordered towards the end (perhaps from enciphering nausea?):-

A: length-4 homophone cycle = (1) F – (2) dotted square – (3) K – (4) dotted triangle
–> 12341234123413234124211
—-> 16 decisions out of 22 follow the cycle pattern

S: length-4 homophone cycle = (1) 6 – (2) S – (3) reversed L – (4) filled triangle
–> 1241234123412341231412
—-> 18 decisions out of 21 follow the cycle pattern

L is interesting because though that seems to start out as a length-2 homophone cycle [diagonal square – B], the diagonal square then seems to morph into a filled square and then back again to a diagonal square. Hence there’s no obvious sign of an actual length-3 homophone cycle as such, only a miscopied length-2 cycle (which then breaks down halfway through, with four diagonal squares in a row).

Yet even though the Zodiac loves words containing LL (kill, thrill, will, all, etc), he only actually seems to be using a length-2 homophone cycle for L (if slightly miscopied). That is, he is probably using a generalized model of English letter frequency distribution rather than a particular model of his own English letter frequency distribution.

The odd thing is that if you go through Dave Oranchak’s list of Z408 homophone sequences, you’ll see that it doesn’t quite match the traditional “ETAOINSHRDLU” frequency ordering (I count L as length-2):
* Length-7: E
* Length-4: TAOINS
* Length-3: R
* Length-2: LHFD

Was there an American amateur cryptography book of the 1950s or 1960s that espoused this frequency distribution?

Here’s a nice story that should bring heart to researchers struggling with uncracked homophonic ciphers (e.g. Zodiac Killer Ciphers, Beale Papers, etc). Kevin Knight, who Voynich Manuscript researchers may remember from various posts here, has now co-authored a 2011 paper with Beáta Megyesi and Christiane Schaefer from Uppsala University on how they cracked a hitherto unknown (to me, at least) 105-page ciphertext dated 1866 they call the Copiale Cipher.

Slightly unhelpfully, the authors refer only to the manuscript as having come “from the East Berlin Academy”: in fact, as far back as 1992/1993 the East Berlin Academy of Arts and the West Berlin Academy were merged into a single Academy of Arts, Berlin (i.e. the Akademie der Künste). I searched the Akademie’s archives to see if I could find the source but only managed to find one plausible-sounding hit:-

Record group: Döhl – Reinhard-Döhl-Archiv
Classification group: 6.1. Fremde Manuskripte
Lauf. Nummer: 3625
Dat. => Findbuch: o.O., o.D.
Titel: [ohne Verfasser]: die sentenzen verschlüsselter deutbarkeit […]

Perhaps someone with better German and more persistence than me will find the actual manuscript reference.

Anyway, Knight/Megyesi/Schaefer give a nice account of how they went about analysing the neatly-written ciphertext, the various hypotheses they came up with along the way, and how they finally managed to decrypt it (though admittedly they initially only transcribed 16 pages), apart from eight mysterious logograms (i.e. an eight-entry nomenclator “for (doubly secret) people and organizations”). Here’s their translation of the first few lines, which make it quite clear what kind of a book it is:-

First lawbook
of the [1] e [2]
Secret part.
First section
Secret teachings for apprentices.
First title.
Initiation rite.
If the safety of the [3] is guaranteed, and the [3] is
opened by the chief [4], by putting on his hat, the
candidate is fetched from another room by the
younger doorman and by the hand is led in and to the
table of the chief [4], who asks him:
First, if he desires to become [1].
Secondly, if he submits to the rules of the [2] and
without rebelliousness suffer through the time of
apprenticeship.
Thirdly, be silent about the [5] of the [2] and
furthermore be willing to offer himself to volunteer
in the most committed way.
The candidate answers yes.

The interesting thing about the date is that it predates the 1887 founding of the Hermetic Order of the Golden Dawn by 20 years or so: and many (if not most?) regular Cipher Mysteries readers will recall that that was founded with a (quite different) mysterious cipher document allegedly referring to a certain “Fraulein Anna Sprengler” mentioned in the enciphered text. By way of comparison, Aleister Crowley’s favourite Ordo Templi Orientis was founded only in 1895 or thereabouts.

Hence the really big question about this enciphered document is whether there is any connection (perhaps even Anna Sprengler) between it and the Golden Dawn Ciphers. The answer may well lie in the 89 pages as yet untranscribed by K/M/S… hopefully we shall see!

Update: since writing this, I found that K/M/S have put up a detailed web-page including scans, transcriptions, and English translations of the whole 105 pages. Codicologically, they say it is “beautifully bound in green and gold brocade paper, written on high quality paper with two different watermarks [and] can be dated back to 1760-1780.”

They also note that they think it is a document of an “18th century secret society, namely the “oculist order”. A parallel manuscript is located at the Niedersächsisches Landesarchiv, Staatsarchiv Wolfenbüttel.” Which of course rules Fraulein Sprengler out. 🙂

To be honest, the part in the ceremony described where they pluck a hair from the eyebrow of the initiate reminds me not a little of the Simpsons’ Stonecutters episode (“Who holds back the electric car? Who makes Steve Gutenberg a star? We do! We do!”), but perhaps let’s not dwell on that too much… 🙂

A quick apology to Cipher Mysteries email subscribers: some illegal text characters (now fixed) that accidentally sneaked into a recent post caused Feedburner (the Google service I use to email posts to you) to go all huffy for a few days. Hence I’m very sorry to say that you’ve missed out on three recent updates to the site.

They were (in chronological order):
(1) Harvard Professor nearly wades into Voynich swamp…discusses an upcoming lecture at Cambridge University on various Slavic mystery documents and John Stojko’s Voynich theory.
(2) Voynich fruitiness back in season…discusses two recent fruity Voynich theories that popped up on the Internet, one linking the VMs with Jewish pharmaceutical conspiracies, the other with the coelacanth (yes, really!).
(3) Decent 2010 paper on the Zodiac Killer Ciphersdiscusses a paper by two Norwegian academics searching for homophone cycles in the uncracked Z340 Zodiac Killer cipher.

Feel free to click through and have a look at them, they were all good posts, well worth a read. Enjoy! 🙂

Here’s some more on the Zodiac Killer ciphers, specifically the interesting uncracked one (“Z340”). Though most of the images of this on the Internet are both monochrome and somewhat overexposed, here’s a link to a nice image of Z340 at a high-enough resolution to be useful. Thanks to this, I think you can see that the correction on row 6 is from a ‘right-facing K’ to a ‘left-facing K’, which could well be a copying error from an intermediate draft.

What’s more, it allows us to transcribe the ciphertext with a high degree of confidence that we’ve got it right: so here’s the transcription that Dave Oranchak and glurk use, which should be more than good enough for non-Zodiac experts wanting to play with it too:-

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+

OK, today’s thought follows on from my most recent Zodiac Killer post, which wondered to what degree cryptologists could make use of the likely presence in Z340 of broadly the same kind of homophone cycles present in the earlier Z408 ciphertext. Well blow me down if I didn’t just run into exactly that today, a paper by Håvard Raddum, Marek Sýs called “The zodiac killer ciphers” published in Tatra Mountains Maths Publ. 45 (2010), pp.75–91: the fulltext is freely downloadable here. There’s an earlier (slightly less formal) 2009 presentation here.

The two authors found evidence of low-level (i.e. length = 2 or 3) homophone cycle structure in the Z340 but not in its transposed version, which is a good indication that the cipher itself isn’t (diagonally) transposed. However, having myself written codes to look for homophone cycles in Z340, I think their assumption that it is a single homogenous cipher is not really justified: they would have got much more striking values had they divided it into two.

Really, the challenge with searching for homophone cycles in Z340 that they failed to address is that the statistical significance of the length 2 or length 3 homophone cycles they found is relatively low compared with the Z408 cipher. How many standard deviations are these actually away from the centre of the distribution? The biggest statistical problem with searching for best homophone cycles is that you have a lot to choose from, which I believe reduces the statistical significance of any you do happen to find. It’s a kind of statistical “darts paradox”: hitting the bullseye once in a million throws doesn’t suddenly make you a great darts player.

Still, they build up a lot of theoretical machinery (though I somehow doubt that you can reliably build n-cycles out of (n-1)-cycles given the many deviations from the cycle scheme the Zodiac Killer makes), which may well prove useful. Definitely something to ponder on.