Though originally published in 1998 and 2003, and most recently published in three volumes in 2013-2014, “Maps, Mystery and Interpretation” is in reality a single (very large) book, the fruits of Geoff Bath’s vast sustained effort to till Oak Island’s unproductive historical soil.

The overall title broadly suggests its three constituent sections, in that Part 1 covers (possibly pirate) treasure maps (“Maps”); Part 2 examines the evidential haze surrounding the Oak Island “Money Pit” mystery (“Mystery”); while Part 3 attempts to put the myriad of pieces together to make sense of them all (“Interpretation”). Simples.

If only the Oak Island mystery itself were as straightforward…

Part 1: Maps

Here, Geoff presents all the “Kidd” maps that Hubert Palmer ended up with, and compares Howlett’s account of them with Wilkins’ account, as well as – and this is the good bit – lots of letters written and received by both Wilkins and Palmer.

I can’t be the only reader to find himself or herself surprised by Bath’s conclusion – that Wilkins essentially got it all just about right, while Howlett got a great deal of it wrong.

All the same, as far as reconstructing the modern history of the Palmer-Kidd maps goes, Geoff’s reasoning here seems very much on the money. I’d say his account gets far closer to what happened than even George Edmunds’ account (stripping both authors’ conclusions out of the picture first).

However, Bath gets himself in something of a tangle trying to make sense of the various maps Wilkins originated (both in Part 1 and in Part 3). Was Wilkins adapting maps or documents otherwise unseen, using them as templates for his own creations, or trolling his readers to help him identify mysterious islands? Too often Bath seems content to speculate in a way that paints Wilkins in an almost Svengali-like way, a kind of Andy Warhol of treasure maps.

In reality, I’m far from sure that Wilkins was any closer to historical clarity than we are now. Given that I can’t read more than a handful of pages of his “A Modern Treasure Hunter” without feeling nauseous (the fumes! the bad accents! the ghosts!), I just can’t see Wilkins as anything like a consistently reliable source, even about himself.

Yet one of the most specifically insightful things that emerges from Part One is Bath’s observation that it isn’t necessary for these maps to actually be Kidd’s for them to be independently genuine. That is, the set of maps’ whole association with Kidd might be something that was overlaid onto a (non-Kidd) set of maps: the supposed Kidd link might easily have been added to the mix as a way of “bigging up” someone else’s maps. If this is true (and you don’t have to believe that these are Oak Island maps for it to be so), many of the difficulties that arise when you try to link them to Kidd (e.g. dating, language, etc) disappear.

It’s still hellishly difficult to make sense of these maps, for sure, but Geoff is right to point out that Kidd may well turn out to be part of the problem here, rather than part of the solution or explanation. Something to think about, for certain.

Part 2: Mystery

In my opinion, Oak Island is a wretched, wretched subject, filled with all the slugs and snails of cipher mysteries and not the vaguest flicker of any of the good stuff. It’s a bleak, barren evidential landscape, filled with unconfirmed micro-features briefly noted by a long series of individual investigators, before being quickly razed from the face of the earth by gung-ho treasure hunters. There seems little genuine hope that any faint trace of anything historical or sensible still remains.

Putting the speculative sacred geometry and shapes picked on maps to one side, there are some (though not many) good things in Part Two I didn’t previously know about. Specifically, the idea that tunnels and features might have been dug aligned with the local magnetic compass at that time is quite cool, though obviously something that has been much discussed over the decades.

So I’m terribly sad to have to say that even a perceptive and diligent researcher such as Geoff Bath can make no real difference to this long-standing disaster area. His Part 2 is therefore little more than a Ozymandian monument to the effort and greed sunk in the pursuit of the Money Pit (not that a brass farthing or even so much as a period button has come of it to date).

Nothing beside remains. Round the decay
Of that colossal wreck, boundless and bare
The lone and level sands stretch far away

Part 3: Interpretation

Having struggled through the unpromising desert of the previous part, my expectations as to what Part 3 might bring were fairly low. But as Bath works his way through his interpretation section (repeatedly railing against the pox of untestable hypotheses), something actually rather odd happens.

All of a sudden, he mentions the Venatores (a early 20th century treasure hunting group) and the Particulars (a set of treasure hunting documents collected together by the Venatores). As this enters the picture, it’s as if a curious wave ripples through the whole research fabric: that, contrary to what you might have thought from the two previous books, it’s all not about whether Wilkins was credible or incredible, or whether Hill Cutler was stone cold serious or laughing all the way to the Terminus Road Lloyds Bank in Eastbourne, but instead that there might actually be something behind it all.

That is to say, what emerges – though all too briefly – is a frisson of that wonderfully engaging secret history paranoia where you can just sense stuff going on behind the scenes but which you know you probably won’t ever gain access to.

In the end, Bath’s well-researched and well-written books didn’t manage to persuade me of the existence of a link between the various treasure maps and the Oak Island mystery (and that, indeed, is a hypothesis that would seem to be politically untestable) nor of any kind of geometric cartography plan driving it all. However, it did manage to convince me that the whole Money Pit enterprise might possibly be built not on a vast hole, but instead on a history whose fragmentary parts have been scattered on the winds, and yet which might possibly be reassembled in the future.

It probably won’t happen but… who can say?

A little while back, I had a email from Marie about Alexander d’Agapeyeff’s (1939) book “Codes and Ciphers”, highlighting some interesting mistakes she had found in his section on double transposition cipher.

D’Agapeyeff described this as a cipher system that the Russian Nihilists had used, but said that they had used the same keyword for both halves of the transposition (i.e. for transposing both the columns and the rows), a technical flaw that made it easy to crack. (Oddly, the Nihilists are nowadays associated with an entirely different kind of encipherment.)

Let’s take a closer look…

D’Agapeyeff’s Double Transposition

What follows is d’Agapeyeff’s account, with comments along the way.

At the end of the nineteenth century the Russian Nihilists used a double cipher, which, having been transposed vertically, was then transposed horizontally; but they made the mistake of using the same keyword in both transpositions. As it is a common variation of double columnar cipher, we give it as an example:

The first thing that Marie picked up on was that the way that d’Agapeyeff converted the transposition keyword SCHUVALOF to an ordering was clearly incorrect: F is the sixth letter of the alphabet, so there is no obvious way that it would be counted as the highest ranked of the nine letters in the keyword. When I looked at this, I immediately guessed that it should instead have read SCHUVALOV – as it turned out, this was a good try, though still very slightly wrong. 😐

Regardless, it should already be clear that something a little non-obvious is going on here.

Now suppose we have to encipher the following: ‘Reunion to-morrow at three p.m. Bring arms as we shall attempt to bomb the railway station. Chief.’

The ‘abcd’ at the end are ‘nulls’ used to fill in the squares.

Now we transpose the message according to the letter sequence of the keyword:

So the message reads:


In all languages where certain letters must follow or precede certain others, the deciphering of this script will never present difficulties. We first count the number of letters in the script (81), which will give us the size of the square (9×9), and once this is done all we have to do is remember that in nine cases out of ten ‘h’ follows either ‘t’ or ‘s’ or ‘c’, and that the bigrams such as AT, TO, WE and the very helpful (English) trigram ‘the’, and the doubles TT, LL, EE, etc., are the most common. In fact, the Russian police soon found out all about that conspiracy.

The second thing Marie noted here was that d’Agapeyeff was using the double transposition decryption direction here, rather than the encryption direction.

All in all, I’d agree with Marie that d’Agapeyeff didn’t seem to have fully understood how the system worked. Smartly, though, Marie now doggedly decided to look at d’Agapeyeff’s crypto sources, to see if he had copied this whole section blindly from somewhere. And, eventually, she found that d’Agapeyeff’s direct source for the above was none other than…

Auguste Kerckhoffs

…the Dutch cryptographer Auguste Kerckhoffs (1835-1903).

Kerckhoffs’ influential book (well, extended article, really) “La Cryptographie Militaire” is available online as a PDF, or as an HTMLized version here.

What follows is my usual free translation of Kerckhoffs’ description of double transposition, which we can immediately see beyond any reasonable doubt as being the source for d’Agapeyeff’s version:

On the occasion of the Nihilists’ last appearance in court, the Russian newspapers published the accused’s secret cipher. It is a system of double transposition, where the letters are first transposed by vertical columns, and are then further transposed by horizontal rows. The same word serves as a key for both transpositions: to do this, the keyword is transformed into a series of numbers, where each number matches the rank of the letter within the normal alphabetical sequence.

Here is the process applied to the word SCHUVALOW:

OK, though I was on this occasion very slightly wrong (SCHUVALOV rather than SCHUVALOW), I was at least wrong in the right kind of way. 🙂 Kerckhoffs continues:

Now, if we were to transpose a sentence such as this one – Vous êtes invité à vous trouver ce soir, à onze heures précises, au local habituel de nos réunions – we would proceed first as in the previously described [single transposition] case, and then carry out the same operation for the horizontal rows.

   = s c i a u e s e l a v i v o n t e u v t r e r s o u c a c a b i o l h t n e l o s u d e r, etc.

However complicated this transposition may appear to us, deciphering a cryptogram written with this system, can never present insurmountable difficulties in languages ​​where certain letters only present themselves in particular combinations, such as q or x in French. Here, the Russian decipherers seem to have carried out their decryption work in a relatively short time.

For any passing conlang fans, Auguste Kerckhoffs was also closely associated with the artificial language Volapük, which some people think is really koldälik. 🙂

d’Agapeyeff + Kerckhoffs = …?

It’s important to remember that d’Agapeyeff wasn’t himself a cryptographer, but rather someone who was trying to collect together interesting crypto stuff into a book that had originally been commissioned for someone else entirely to write. The project wasn’t something he was aiming to do, but rather something that fell in his lap.

As Marie points out, the big technical thing that d’Agapeyeff got wrong is that the numbers are the wrong way round, and so he is performing a double transposition decryption rather than a double transposition encryption: the two are not the same at all. That is, if you used SCHUVALOW as your single transposition keyword and then single transposition encrypted the text “SCHUVALOW”, you should get the ciphertext “ACHLOSUVW”: but both Kerckhoffs and d’Agapeyeff (copying Kerckhoffs) seem to have got this the wrong way round.

Having thought about this for a little while, I’ve come to suspect that d’Agapeyeff may well have faultily believed that double transposition was a self-inverse process, i.e. where the decryption and encryption transformations are identical.

All of which would dovetail very neatly indeed with the report that we have that he was unable to decrypt his own challenge cipher: for if he (wrongly) believed that double transposition was self-inverse, then he wouldn’t (if his challenge cipher had used double transposition) have been able to decrypt it at all. If this is correct, then his failure wasn’t anything as foolish as misremembering the keyword, but instead misunderstanding one of the component ciphers that made up the overall chain.

Might this insight help us decrypt his challenge cipher? Well… insofar as it now seems far more likely to me that he used double transposition as one of his stages, then the answer may very well be yes. Hopefully we shall see… 🙂

Prolific (if occasionally prolix) Cipher Mysteries commenter bdid1dr has long wondered whether the Somerton Man was someone in her ex-husband’s family. (She also suspects her ex-husband was the infamous Zodiac Killer, but let’s leave that for another day.)

Even though it at first sounds like an outrageously long shot (and one that would perhaps necessitate a Warren Commission ‘magic bullet’), it does in fact concord with many of the things we know about the Somerton Man, in perhaps surprising ways.

For a start, the aluminum comb, the packet of Juicy Fruit chewing gum found in the Somerton Man’s American-stitched coat and indeed the coat itself have all been taken as suggesting that the Somerton Man was American (or had recently travelled from America).

More specifically, Derek Abbott launched his recent (but unsuccessful) crowdfunding campaign on the back of a fragmentary DNA match between one of the hairs found embedded in the plaster cast bust of the Somerton Man and Thomas Jefferson.

Yet it turns out that the Shackelfords are an old Virginian family… with links to Thomas Jefferson. OK, this is all still very far from proof, but we’re not yet veering into anything like the canonical Lands Of Somerton Nonsense: so please bear with me just a little longer as we take a look at the Shackelfords…

Lee Erwin Shackelford

According to the Sydney Morning Herald, he was born on 12th April 1945 to Willian Shackelford and Normaleen (nee Park):

SHACKELFORD (nee Normaleen Park). April 12, King George V Hospital, Camperdown, semi-private, wife of T./Sgt. W. Shackelford, U.S. Air Corps – a son (both well)

And thanks to a little archival magic (big tips of the Cipher Mysteries hat to Eye and Aye for this), we have a photo of Lee Erwin Shackelford from the USS Ticonderoga circa 1964:

He was also bdid1dr’s first husband: she says that he died in New York a few years ago.

He had a brother (Preston Park Shackelford) who was born 10th April 1948 in Vallejo CA: and another brother (Mark) who was born in New Mexico in 1952.

William Jesse Shackelford Jr

Eye and Aye came up trumps here as well, with William Jesse Shackelford Jr’s US Armed Forces registration card (note: image behind a Fold3 paywall). According to this, he was born on 17th May 1922 in Norfolk VA: the “Name And Address Of Person Who Will Always Know Your Address” field is marked up as “Mrs A. B. Shackelford, 1631 Willoughby Ave, Norfolk, VA”. (Willoughby Ave is close to Norfolk’s Lyon Shipyard: #1631’s plot was long since sacrificed to make way for the I-264.)

According to the Registrar’s Report (note: image also behind a Fold3 paywall), William Shackelford Jr was white, 5′ 5″, 125 lbs, hazel eyes, brown hair, and with a ruddy complexion. He received his honourable discharge from the Army on the 30th August 1945 (ref: 13-062-516).

Unless he secretly had access to a Tardis, William Shackelford was not the Somerton Man: he was still very much alive in 1950, 1960, and even 1970.

Misca pointed out that:

On ancestry there is a record of a Normaleen May Shackelford travelling from Brisbane to San Francisco with her son Lee Ervin/Erwin. The name of the friend/relative she states she is visiting is William Shackelford, 835 Oaklette Avenue, Norfolk, Virginia. A 1940 census document shows two William J Shackelfords living on Oaklette. One is 39 and the other is 17. Father and son. Further research shows the son as having been in the US Airforce in WW II. He is William Jesse Shackelford. He married three times. First wife unknown but I suspect it may have been Normaleen. Second wife (married in 1957) Leila Barnes Stewart (who seems to have died), third wife Catherine Anne Garrett.

William Jesse Shackelford Sr

William Jesse Shackelford Sr’s obituary (in the 7th December 1972 Virginia Beach Sun looked like this:

William Jesse Shackelford, 73, of 292 Stancil St., Princess Anne Plaza, an insurance agancy [sic] operator, died in a hospital November 28 after a long illness.
He was a native of Walter Valley, Tex., a son of William J. and Mrs Martha Farley Shackelford, and the husband of the late Mrs Josephine Taylor Shackelford.
He was the owner of William J. Shackelford Insurance Co. He was a member of Norfolk Elks Lodge 38, American Legion, and Commodore FOP Lodge 3.
He was a World War I veteran.
Surviving are two daughters, Mrs Bennie S. Jordan of McLean and Mrs Shirley S. Becker of Virginia Beach; a son, William Jesse Shackelford Jr of Alexandria; two sisters, Mrs Cordelia Willcox of Tuolumne, Calif., and Mrs Sylvia S. Snyde of Corpus Christi, Tex.; a brother, Feilx Shackelford of Odessa, Tex.; 11 grandchildren; and 11 great grandchildren.

Might there be a missing Shackelford…?

I hope it’s not construed as unkind of me to note that bdi1d1dr’s handed-down family stories don’t quite add up. At this remove in both time and space, tales about her ex-husband’s family’s life in Australia (he moved to the US at a very young age) are bound to be fragmentary and incomplete.

What is either interesting or just plain Chinese Whispered here is that she was sure that there was also a Lee Irving Shackelford in Australia, who somehow disappeared: and quite how he fits into the whole picture nobody seems to know or remember.

And so my challenge to you fine people is to find out if there was a disappearing relative in William Jesse Shackelford (Jr or Sr)’s immediate family tree. Oh, and who was “Mrs A. B. Shackelford”?

Incidentally, one unusual (but possibly useful) resource here is the Shackelford Clan, a group that published a family history newsletter from May 1945 to April 1957 (scanned issues are listed online here) researching… the history of the Shackelford family. Good hunting! 🙂

Well, here’s a thing. The Thirteenth Oxford Medieval Graduate Conference, to be held in a month’s time at Merton College (31st March 2017 to 1st April 2017) on the theme of “Time : Aspects and Approaches”, has a Voynich-themed paper in its Manuscripts and Archives session on the second day (11:30am to 1:00pm).

This is “Asphalt and Bitumen, Sodom and Gomorrah: Placing Yale’s Voynich Manuscript on the Herbal Timeline“, presented by Alexandra Marraccini of the University of Chicago. The description runs like this:

Yale Beinecke MS 408, colloquially known as the Voynich manuscript, is largely untouched by modern manuscript scholars. Written in an unreadable cipher or language, and of Italianate origin, but also dated to Rudolphine court circles, the manuscript is often treated as a scholarly pariah. This paper attempts to give the Voynich manuscript context for serious iconographic debate using a case study of Salernian and Pseudo- Apuleian herbals and their stemmae. Treating images of the flattened cities of Sodom and Gommorah from Vatican Chig. F VII 158, BL Sloane 4016, and several other exempla from the Bodleian and beyond, this essays situates the Voynich iconography, both in otherwise unidentified foldouts and in the manuscript’s explicitly plant-based portion, within the tradition of Northern Italian herbals of the 14th-15th centuries, which also had strong alchemical and astrological ties. In anchoring the Voynich images to the dateable and traceable herbal manuscript timeline, this paper attempts to re-situate the manuscript as approachable in a truly scholarly context, and to re-characterise it, no longer as an ahistorical artefact, but as an object rooted in a pictorial tradition tied to a particular place and time.

BL Sloane 4016 is a similar-looking herbal that Voynich researchers know well. Most famously, Alan Touwaide wrote a 500-page scholarly commentary on it (as mentioned in Rene’s summary of Touwaide’s chapter in the recent Yale facsimile). It dates to the 1440s in Lombardy, and even has a frog (‘rana’) on folio 81:

Marracini herself is an art historian who previously graduated from Yale, and who has an almost impossibly perfect set of research interests:

Her research focuses on Late Medieval and Early Modern scientific images, particularly alchemical and medical material, in England, Scotland, Germany, and the Netherlands. Her interests in the field also include book history and manuscript studies, Late Antique material culture, and the historiography of art, particularly in Warburgian contexts. Currently, she is writing on the history of Hermetic-scientific images and diagrams, and her work on Elias Ashmole’s copies of the Ripley Scrolls is forthcoming in the journal Abraxas.

All of which looks almost too good to be true. It’s just a shame her presentation falls on April Fool’s Day, so we’re bound to have people claiming that she doesn’t really exist and it’s all a conspiracy etc. 😉

A few days ago, Australian robotics hacker Marcel Varallo (whose gladiatorial hacks making Roombas fight each other amuse me greatly) very kindly posted up two new scans of the Somerton Man’s Rubaiyat code (along with many megs of his collected Somerton Man stuff) on his blog.

I’ve put the three scans we now have on a Cipher Foundation Rubaiyat Code page, and strongly recommend that people use one of the new scans as a basis for doing any image processing work, rather than the one that has been on the Internet for years.

For example, if you put the three scans’ “Q” shapes side by side and try doing image processing experiments on them…

…what you find is that the so-called “microwriting” (found in the leftmost of the three images) was simply a quantizing artefact introduced when the original JPEG image had its brightness and contrast adjusted. With the new (slightly higher resolution, and generally much smoother) scan, all that nonsense disappears. There is no ‘microwriting’ there at all: The End.

Voynich researchers without a significant maths grounding are often intimidated by the concept of entropy. But all it is is an aggregate measure of how [in]effectively you can predict the next token in a sequence, given a preceding context of a certain size. The more predictable tokens are (on average), the smaller the entropy: the more unpredictable they are, the larger the entropy.

For example, if the first order (i.e. no context at all) entropy measurement of a certain text was 3.0 bits, then it would have almost exactly the same average information content-ness per character as a random series of eight different digits (e.g. 1-8). This is because entropy is a log2 value, and log2(8) = 3. (Of course, what is usually the case is that some letters are more frequent than others: but entropy is the bottom line figure averaged out over the whole text you’re interested in.)

And the same goes for second order entropy, with the only difference being that because we always know there what the preceding letter or token was, we can make a more effective guess as to what the next letter or token will be. For example, if we know the previous English letter was ‘q’, then there is a very high chance that the next letter will be ‘u’, and a far lower chance that the next letter will be, say, ‘k’. (Unless it just happens to be a text about the current Mayor of London with all the spaces removed.)

And so it should proceed beyond that: the longer the preceding context, the more effectively you should be to predict the next letter, and so the lower the entropy value.

As always, there are practical difficulties to consider (e.g. what to do across page boundaries, how to handle free-standing labels, whether to filter out key-like sequences, etc) in order to normalize the sequence you’re working with, but that’s basically as far as you can go with the concept of entropy without having to define the maths behind it a little more formally.

Voynich Entropy

However, even a moment’s thought should be sufficient to throw up the flaw in using entropy as a mathematical torch to try to cast light on the Voynich Manuscript’s “Voynichese” text… that because we don’t yet know what makes up a single token, we don’t know whether or not the entropy values we get are telling us anything interesting.

EVA transcriptions are closer to stroke based than to glyph based: so it makes little (or indeed no) sense to calculate entropy values for EVA. And as for people who claim to be able to read EVA off the page as, say, mirrored Hebrew… I don’t think so. :-/

But what is the correct mapping or grouping for EVA, i.e. the set of rules you should apply to EVA to turn it into the set of tokens that will give us genuine results? Nobody knows. And, oddly, nobody seems to be even asking any more. Which doesn’t bode well.

All the same, entropy does sometimes yield us interesting glimpses inside the Voynichese engine. For example, looking at the Currier A pages only in the Takahashi transcription and using ch/sh/cth/ckh/cfh/cph as tokens (which is a pretty basic glyphifying starting point), you get [“h1” = first order entropy, “h2” = second order entropy]:

63667 input tokens, 56222 output tokens, h1 = 4.95, h2 = 4.03

This has a first order information content of 56222 x 4.95 = 278299 bits, and a second order information content of (56222-1) x 4.03 = 226571 bits.

If you then also replace all the occurrences of ain/aiin/aiiin/oin/oiin/oiiin with their own tokens, you get:

63667 input tokens, 51562 output tokens, h1 = 5.21, h2 = 4.01

This has a first order information content of 51562 x 5.21 = 268638 bits, and a second order information content of (51562-1) x 4.01 = 206760 bits. What is interesting here is that even though the h1 value increases a fair bit (as you’d expect from extending the post-parsed alphabet with additional tokens), the h2 value decreases very slightly, which I find a bit surprising.

And if, continuing in this vein, you also convert air/aiir/aiiir/sain/saiin/saiiin/dain/daiin/daiiin to glyphs, you get:

63667 input tokens, 50387 output tokens, h1 = 5.49, h2 = 4.04

This has a first order information content of 50387 x 5.49 = 276625 bits, and a second order information content of (50387-1) x 4.04 = 203559 bits. Again what I find interesting is that once again the h1 value increases a fair bit, but the h2 value barely moves.

And so it does seem to me that Voynich entropy may yet prove to be a useful tool in determining what is going on with all the different possible parsings. For example, I do wonder if there might be a practical way of exhaustively / hillclimbingly determining the particular parsing / grouping that maximises the post-parsed h1:h2 ratio for Voynichese. I don’t believe anyone has yet succeeded in doing this, so there may be plenty of room for good new work here – just a thought! 🙂

Voynich Parsing

To me, the confounding beauty of Voynichese is that all the while we cannot even parse it into tokens, the vast modern cryptological toolbox normally at our disposal does us no good.

Even so, it’s obvious (I think) that ch and sh are both tokens: this is largely because EVA was designed to be able to cope with strikethrough gallows characters (e.g. cth, ckh etc) without multiplying the number of glyphs excessively.

However, if you ask whether or not qo, ee, eee, ii, iii, dy, etc should be treated as tokens, you’ll get a wide range of responses. And as for ar, or, al, ol, am etc, you won’t get a typical linguistic researcher to throw away their precious vowel to gain a token, but it wouldn’t surprise me if they were wrong there.

The Language Gap

The Voynich Manuscript throws into sharp relief a shortcoming of our statistical toolbox: specifically, its excessive reliance on our having previously modelled the text stream accurately and reliably.

But if the first giant hurdle we face is parsing it, what kind of conceptual or technical tools should we be using to do this? And on an even more basic level, what kind of language should we as researchers use to try to collaborate on toppling this first statue? As problems go, this is a precursor both to cryptology and to linguistic analysis.

As far as cipher people and linguist people go: in general, both groups usually assume (wrongly) that all the heavy lifting has been done by the time they get a transcription in their hands. But I think there is ample reason to conclude that we’re not yet in the cinema, but are still stuck in the foyer, all the while there is a world of difference between a stroke transcription and a parsed transcription that few seem comfortable to acknowledge.

Given that the Zodiac Killer’s first big cipher (the Z408) got cracked so quickly, it shouldn’t really be a surprise that he used a slightly different system for his second big cipher (the Z340). What is (arguably) surprising is that whatever change he made to it has not been figured out since then.

But what was he thinking? What did he want from a cipher? And how might his needs have changed between Z408 and Z340?

The Z408

Ciphers are normally made to be as strong as practically possible, given the technological, time, and resource constraints that apply to both sender and receiver: and with the two main driving needs being privacy and secrecy. Note that these aren’t always the same thing: the way I usually describe it is that while sex with your husband is private, sex with your tennis coach is secret. 😉

And so the first thing I find cryptographically interesting about the Zodiac Killer is that he was creating a cipher from a slightly angle from either of these: and he certainly wasn’t trying to communicate in any normal sense of the word.

Rather, I think that the point of Z408 was to be taunting, and to demonstrate to the police that he was in control, not them.

So imagine the Zodiac’s probable fury, then, when little more than a week after his three Z408 cryptograms appeared in local newspapers (the Vallejo Times-Chronicle, the San Francisco Examiner and the San Francisco Chronicle), Donald and Bettye Harden were all over the front pages explaining how they had cracked them.

Didn’t they know who was supposed to be in control here?

What was worse, the Hardens hadn’t used cryptological hardware or even high-powered cryptological smarts. They’d just used the Zodiac’s egoism (they guessed the first letter was “I”) and his psychopathic bragging (they guessed he would use the word KILL multiple times) as keys to his cryptographic front door: and then marched straight in.

I think it’s fairly safe to expect that the Zodiac was pretty pissed off by this.

Note that the Hardens carried on trying to crack the Z340 for many years afterwards: according to their daughter, her “mother wrote poetry and was as absorbed in her writing as she became with the Zodiac codes. She worked on the second code on and off for the rest of her life.

The Z340

Comparing the overall style of the Z340 with that of the Z408, there seems to be plenty of reasons to think that the two are, at heart, not wildly different from each other. And yet (as is widely known) all the big-brained homophonic solvers written since haven’t made any impact on the Z340 at all.

All the same, I think the second interesting thing to note is that the changes to the Z340 system were surely not made to defend against computer-assisted codebreaking (because that hadn’t yet happened), but rather to make the updated system Harden-hardened, so to speak.

What does this mean? Well, we can probably infer that the first letter of the Z340 is almost certainly not I (not that that helps us a great deal) and the Zodiac Killer must have done something to conceal or remove the KILL weakness.

But, in my opinion, that latter change would surely not have been a theoretically-motivated cryptographic adaptation (he was without much doubt an amateur cryptographer), but rather something pragmatic and empirical, perhaps along the lines of:
* adding a repeat-the-last-letter token
* add an LL token
* add an ILL token
* add nulls inside tell-tale words
* etc

But there’s a problem with all of these. In fact, there are several problems. 🙁

The Problems

The first problem is that I don’t currently believe any of the above changes are disruptive enough to explain what we see in the Z340.

The basic stats of the four main Zxxx ciphers are:
Z408: 408 symbols, from a set of 54 unique symbols. (Note: E has 7 homophones, AST have 6 each, IO have 5 each, N has 4, FLR have 3 each, DHW have 2 each, everything else has 1).
Z340: 340 symbols, from a set of 63. [Hence symbols/textsize is 18.5%, a fair bit higher than the Z408’s 13.3%]
Z32: 32 symbols, from a set of 30.
Z13: 13 symbols, from a set of 8.

It would be very tempting to suspect (as many people have) that the Z340 is ‘therefore’ just the same as Z408 but with 39% more homophones. Yet a problem with this popular hypothesis is that it should be well within range of automated homophone solvers, and to date they haven’t managed to make any impact.

A second problem is that the kind of homophone cycles that so characterized the Z408 seem to be largely absent in the Z340: and yet because the Zodiac Killer would not have had any clue that these were a technical weakness of his system, it seems unlikely to me that he would have adjusted his system to work around a weakness that he didn’t actually know was a weakness.

A third problem is that the Z340 has a fair number of asymmetries that don’t fit the it’s-a-straight-homophonic-cipher model. For example, lines 1-3 and 11-13 have (as Dan Olson pointed out some years ago) almost no character repeats.

There are yet other asymmetries: for example, while 63 different symbols appear in the top ten lines, only 60 appear in the bottom ten lines. And there’s the mysterious ‘-‘ shape at the start and end of line 10: and the odd-looking “ZODAIK” sequence on line 20.

One final asymmetry: the ‘+’ shape seems to function differently in the top and bottom halves – it is often preceded by ‘M’ in the top half, but never preceded by ‘M’ in the bottom half.

How does assuming the Z340 is a pure homophonic cipher explain any of these behaviours, let alone all of them?

Lines 1-3 and 11-13, revisited

I keep coming back to the 1-3 and 11-13 property as mentioned here. I think it’s important to say that Dan Olson’s conclusion (that “lines 1-3 and 11-13 contain valid ciphertext whereas lines 4-6 and 14-16 may be fake”) seems likely to be landing a little bit wide of the mark.

To me, this same property of these lines implies (a) that the homophonic versions for each letter were probably used in pure sequence here, but also (b) the homophone cycles were somehow ‘reset’ after ten lines (i.e. the homophone cycles all started again at the start of line eleven). And perhaps also that any characters repeated in the first three lines are rarer characters, rather than the homophone-friendly ETAOINSHRDLU etc.

It might even be that the Zodiac Killer kept on adding homophones as he constructed the cipher UNTIL he had three lines’ worth of essentially unique homophones: that is to say, that the three line blocks in 1-3 and 11-13 are how his system made the choice of the number of homophones, rather than as a consequence of the number of homophones he chose. Nobody has yet (to my knowledge) satisfactorily explained where he came up with his homophonic allocation for Z408: certainly, searching for this in crypto books hasn’t yielded any likely candidates.

Could it be that the Zodiac Killer worked backwards from his actual Z408 ciphertext to determine the number of homophones, rather than worked forward from the number of homophones to the ciphertext?

Update: I received the following off-line comment from David Oranchak, but thought it better to update it within the post itself…

Nick, there are a few other seemingly rare phenomena that can be observed in Z340. I’m curious what you think of them.

The first is the pivots:

Those kinds of patterns are difficult to arise by chance, so they are suspected to be some sort of feature of the encoding scheme.

Z408 is littered with repeating bigrams but Z340 seems to have fewer than would be expected via normal homophonic encipherment of a plaintext in a normal reading direction. However, the bigrams show up again if you consider a periodic operation on the cipher text:

The count of 25 repeating bigrams jumps to 37 or 41 or even higher, depending on the periodic operation applied to the cipher text. Here is a tool that illustrates the various operations:

You’ve already identified the seemingly rare phenomenon of rows that lack repeating symbols. There are 9 such rows. In 1,000,000 random shuffles of Z340, none had that many rows. In fact, the best that was found was 8 rows which occurred in only 12 of the shuffles.

Your “M+” asymmetry observation seems to fit in with the general observation that repeating bigrams are phobic of certain regions of the text. The lower left, for instance, seems to hate bigrams:

Another really strange observation is the distribution of non-repeating string lengths. For each position of Z340, measure how far you can read forward without encountering a repeating symbol. You end up with a string with unique sequences of length L. Jarlve found that for Z340, there is a peak of 26 occurrences of unique sequences of length 17 (which happens to be the width of Z340). It is really interesting that in random shuffles, this phenomenon is only observed on the order of one in a billion shuffles.

Finally, I would recommend that anyone interested in this topic should check out this thread on morf’s Zodiac forum: Especially the more recent posts on the latter pages. “Jarlve” and “smokie” in particular are doing fantastic work exploring various transcription schemes that could explain the various curious features of Z340 (in particular, the relationships between periodic bigrams and transposition schemes).

In some ways, it’s the shortest of distances from [Ethel Voynich] to [Ethel Merman], so why not “Voynich, The Musical“? Close your eyes, imagine a Broadway stage, take out a mortgage to get yourself a semi-affordable seat, spill a drink on your leg, and you’re as good as there…


Act One, Scene One

It’s 1912. A single spotlight illuminates an old trunk in the middle of an otherwise empty wooden stage: there’s dust in the air. We hear slow, sustained violins off-stage, harbingers of the big discovery that is about to happen.

WILFRID appears stage right. He is well dressed (though a little tweedy for our modern tastes), and wears small round glasses. He looks in the prime of his life – there’s a vigour and physical excitement to him. He approaches the trunk, opens it, takes out an old book and peers inside it. As his eyes grow ever wider, the violins swell, and he sings his first number “Friends To The End”.


This never happened – I wasn’t here.
There was never a trunk (that was junk), isn’t this queer?
I conjured a castle, to hide Jesuit lies…
While the customer’s king, I’ll say anything (however unwise).

[Chorus] But you, you were always real
Even if you made me feel
Like an antiquarian schlemiel –
I couldn’t comprehend.
But I knew, I knew when I met
My ugly duckling Juliet
With your strange alphabet
We’d be friends to the end…
Friends to the end.

Act One, Scene Two

Back in London, WILFRID hesitantly shows his newly-acquired manuscript to his wife ETHEL: he thinks it’s going to make them rich. However, ETHEL cannot believe that he has wasted money on something as unbelievably stupid as a book that nobody can read. To make her feelings on the matter completely clear, she sings her angry opening number “Down the drain”.


Little naked women
Standing round or swimming
What is this you’re bringing
To our house?
You can’t read a word of it
Written by a heretic
I can’t see the benefit
To man or mouse

[Chorus] You put good money / Down the drain
Buying enciphered / Castles in Spain
Were those nymphs fogging / Your revolutionary brain?
Or has their writing sent you / Completely insane?

Act One, Scene Three

WILFRID has moved to New York, and is trying (unsuccessfully) to convince wealthy American collectors to buy his unreadable manuscript. Though his sales patter normally charms the birds down off the trees, he’s finding it difficult to find anyone with any affinity for this unusual artefact. His song “It’s No Use” documents his ongoing struggle.


There’s jazz and money in the air
The excitement of a New World at play
New rules, new wealth, new clothes, new hair
America strides into a brand new day

You, sir, with your spats and suits
Your garden parties and Egyptiana
Might I interest you in this book’s strange roots
And its hard-to-pin-down flora and fauna?

[Chorus] It’s no use
My duckling’s no swan
I’ve cooked my goose
My big chance has gone
I’ll find no willing
Who’ll pay more than a shilling
They’re too mercantile

Act One, Scene Four

It’s 1930 in New York. WILFRID is dying, having never been able to sell his “Roger Bacon” manuscript. ETHEL brings his beloved manuscript to him, so that he can see it one last time. WILFRID sings a song to both of them: “It’s Time To Say Goodbye”.


Perhaps I was wrong / To hope for the best
To follow every wastrel clue / Like a man possessed
Why can’t anybody else / See what I see?
Are they put off by mere / Indecipherability?

[Chorus] It’s time to say goodbye
To the woman I have loved
And greet the naked angels
Hovering above
I’ve seen them for years
Sitting on my shelves
Filling every page of
Quires eleven and twelve

Two of the least commented-on aspects of the Voynich Manuscript’s “Voynichese” alphabet are (a) its symmetry and (b) its partitioning into quite well-known (but distinct) usage groups. For example:

* the four gallows characters, where EVA t and EVA k are almost always interchangeable, while the single-leg shapes for EVA p and EVA f closely mirror the double-leg shapes for EVA t and EVA k. (And let’s leave the strikethrough gallows aside for the moment.)

* the EVA aiin family of letter groups, which all operate in a very specific way: there are no contexts where ain appears that you wouldn’t also see aiin or even aiiin.

* the ar / or / al / ol group, whose members seem to appear within words in much the same way as each other. The air and aiir letter groups might also be related to this set, though this isn’t not 100% clear. Similarly, -am often seems to me (with a hat tip to Emma May Smith, who discussed -m recently) to be something closer to a combination of ar and hyphen, i.e. that -am at the end of a line often resembles the end of the first half of a word broken in half by the line-ending (and where the second half of the word is at the start of the next line, but disguised with an extra letter inserted before it).

* the -dy and -y word endings, which both seem to be cut from almost exactly the same cloth.

* the e / ee / eee / ch / sh / eo group, which seem to me to function slightly differently between A and B pages.

* the qo group, which almost universally seems to operate as a prefix. In those places where we get l- words, we also get qol a lot: and where l- words don’t appear, we get almost no instances of qol.

Cross all the above instances out, and what remains is a very sharply reduced set of usage groups, such as d- words (in particular daiin, which seems to operate in a mysterious world all of its own), o- words (particularly in front of gallows), and y- words.

What about EVA s?

But if you do do this kind of crossing out, you also won’t find a comfortable place for EVA ‘s’ to go. In fact, to my eyes EVA ‘s’ appears to be the single most anomalous character in the Voynichese alphabet: there’s a strong case to be made that it is the most ‘exposed’ single glyph of all of them, and – by that same token – the one we should spend most time on trying to understand. What I’m saying is that EVA s might well be the weakest link in the Voynichese chain.

If you remember to put aside all the completely different ‘sh’ characters (sharing ‘s’ for both of these glyphs was, in my opinion, a foolish mistake in the design of the EVA transcription scheme, *sigh*), you find that ‘s’ occurs about 1.71% of the time in A pages, and about 1.00% of the time in B pages. If you remove any ‘as’ or ‘os’ pairs (as being probably miscopied or mistranscribed ‘ar’ / ‘or’ pairs) from these stats, these figures go down to 1.34% and 0.83% respectively.

And yet some A pages have numerous s characters (e.g f14r, f15r, f24r), while others have one or fewer s characters (e.g. f14v, f18r, f19r): that this single statistic can differ so much between the two sides of the same folio is something that hasn’t really been noted before, as far as I can recall. [Unless any Lorites out there care to show me the precedent I’ve missed: in one of Friedman’s groups, no doubt.]

All of which incidentally reminds me of something that Glen Claston told me he noticed when he was making his transcription (but which I now can’t find in my email archive, *sigh*): that Voynichese had different clusterings of letter usages that would seem to go into and out of fashion (almost as if one kind of ‘mode’ was active now, and then a different mode active later), sometimes by paragraph, sometimes by page. If this is correct, then perhaps ‘s’ is an active part of some ‘modes’ but not others – just an idea.

What about saiin vs daiin?

I find it interesting that sdaiin occurs only once (on f66r), while sdain, sdaiiin, dsain, dsaiin, and dsaiiin don’t occur at all: yet saiin occurs 144 times.

If s- is some kind of prefix token here, then it seems that so too is d-, and in a way that makes the two avoid stepping on each other’s toes.

My suspicion (for what it’s worth) is that while both work as prefix tokens, they in fact code for two quite different classes of mechanisms: and, moreover, that both prefixes are more meta-linguistic than linguistic in any useful sense.

And what about the first column?

EVA s also has a strong tendency to appear as the first letter of a (non-paragraph-starting) line, particularly in Balneo B pages – but this may possibly be because Balneo B tends to have longer paragraphs than elsewhere.

Combine this (a) with the well-known observation that the first word on each line tends to be slightly longer on average than all the other words on a line, and (b) with Philip Neal’s suggestion that the first letters down some Voynich Manuscript pages might well be a vertical ‘key’ or something similar, and you get an interesting possibility to consider: that line-initial ‘s’ may specifically operate as a null that the writing system needs to prepend to certain (typically short) words.

I was thinking about this today, triggered by a Voynich Ninja forum discussion: I wondered if it might be possible to construct a statistical experiment to test my suggestion that line-initial s- might function as a null character that gets prepended to certain short words (such as aiin).

According to the tentative model I have in mind, the (aiin : daiin) ratio for non-line-initial words should be roughly the same as the (saiin : daiin) ratio for line-initial words. And perhaps it would be good to then repeat broadly the same test for non-line-initial (ar : dar) vs line-initial (sar : dar), etc.

However, I don’t have the right counting tools to do this easily: can anyone please run this test? Thanks!

Hi There! We’re looking for people to write up their theories on cipher mysteries such as the Voynich Manuscript, the Beale Papers and how astroturfed the Tea Party is. You may be surprised to discover that your foolish clickbait opinions could earn you upwards of $0.02 per day, and might even be worth double that (if they are so unbelievably bad that they go viral on Slashdot or Reddit).

To tap your teensy spile into this towering cask of wealth, there’s no need for an office, formal clothes or indeed any clothes beyond your normal tattered rags. Simply compose your posts and comments from the comfort of your own bedsit, surrounded by your piles of old newspapers, unreturned library books, and much-loved microwave meal boxes. Who could ask for a better or more convenient life?

Yes, you too can turn your vapid leaden thoughts into 24 carat Internet gold, just like alchemists and well-known YouTube sock puppet presenters the world over already do. And let’s face it, if Stampy and Squid can make it there, then so can you, right?

Who cares if you haven’t cut your toenails since Dubya left the White House? We don’t! Google values novelty over content, so to become a high-value content creator in this brave new online world, all you have to do is tap into the same rickety stream of consciousness that pushes angry unspoken words in your mouth when you’ve yet again found yourself stuck in the non-moving queue at the supermarket till and type, type, type.

There, now doesn’t that feel better! And how much do we charge you for this “keyboard therapy”? $100? $1000? No, not even close – in fact, we pay you for it. A frighteningly small amount, sure, but let’s not bicker over mere semantics.

How do you get going? Just start your own blog, proclaim yourself an expert on a particular subject (it doesn’t matter what, nobody cares), leave back-linked comments on forums and other people’s blogs, or even – now get this – leave comments on your own blog under false names to make visitors think that there’s some kind of ‘community’ buzz around the nonsense you’re passing off as high-quality thought.

Before long you’ll even be ready to cut-and-paste all your tripe into a 65-page ebook and sell it for $12 a pop. Still think this is all a pipe-dream? No, it’s not! Many thousands of people have jumped about this $$$money$$$ $$$train$$$ already, so why not you as well?

Still believe you’re not right for this extraordinary new world? Think again! You’ll shock yourself when you find out quite how painfully easy it is. Amaze your friends (if you have any, which seems fairly unlikely), take that step, and type, type, type!