It’s well-known that over the last two centuries, the quest for the mysterious “Money Pit” on Oak Island has yielded no sign of treasure while simultaneously consuming an inordinate quantity of diggers’ dollars – and if you can even think about all that without silently mouthing the phrase ‘ironically enough’, you have a huge amount of self-control. 😉

Yet despite all that ‘activity’, nothing of any actual substance about the whole curious enterprise that put or left the (so-called) pit there in the first place seems to have emerged. All that has been achieved is that (a) a small island has been ravaged by glinty-eyed treasure hunters, and (b) bookshelves have been filled with books that almost all manage to leave readers somehow less knowledgeable than when they began.

If you pause to reflect on the scale and prolonged fruitlessness of this archaeological disaster zone even momentarily, you’ll surely find it hard to prevent the two words “Epic” and “Fail” from lurching to the front of your mind. 😐

“The Curse of Oak Island”

Perhaps naturally enough, it seems that the (apparently obligatory) combination of determination, hubris, cupidity and stupidity that Oak Island treasure hunters have also makes them ideal Reality TV subjects, every bit as good as the Kardashians, TOWiE or whatever. Which is why the Canadian reality TV show “The Curse of Oak Island” (which premiered in 2014, and follows the Oak Island treasure hunt being pursued by Michigan brothers Marty and Rick Lagina) is now in its fantabulous 4th season. Will it ever end? (What do you think?)

Whatever you personally make of the whole Oak Island reality TV project, it is surely a brutal mirror to hold up to modern culture’s pox-plagued visage: for if all it boils down to is a fruitless search for something that nobody can describe and for which there seems to be no actual evidence, surely nobody involved can emerge the other side looking or smelling good. 🙁

Yet, curiously paralleling the Anton Transcript at the core of Mormonism, at the heart of the Money Pit mythology lies a cipher mystery that has had so much screen time in Z-grade historicalist documentaries that it practically has its own Equity card. Yes, I’m talking about a cipher that could get gigs on cruise ships.

As per normal, nobody knows whether or not this cipher is the real deal or merely Milli Vanilli. Moreover, it turns out that – just like the two versions of La Buse’s cryptogram – it also has a secret twin cipher (and nobody knows whether or not that’s real either), which we’ll (eventually) return to in Part 2. 🙂

Anyhoo, it’s time we all had a proper Cipher Mysteries look at the first (and infamous) Oak Island cipher…

The 80-foot Rock Cipher

Though most modern authors call it the “90-foot rock” cipher, this was claimed to have been found carved into a rock found eighty feet underground. As usual, I try to avoid following trends if I know they’re broken. 🙂

Regardless, the first documentary mention of it is in a 2nd June 1862 letter written by treasure hunter Jotham B. McCully of Truro, printed in the “Liverpool Transcript” in October 1862 in response to a critical article entitled “The Oak Island Folly”. McCully wrote “The Oak Island Diggings” to explain why he and the other treasure hunters were so convinced there was treasure in the Pit.

Bearing in mind that, according to other records, the original ‘Onslow Company’ search started in about 1795…:

“About seven years afterwards, Simeon Lynds, of Onslow, went down to Chester, and happening to stop with Mr. Vaughn, he was informed of what had taken place. He then agreed to get up a company, which he did, of about 25 or 30 men, and they commenced where the first left off, and sunk the pit 93 feet, finding a mark every ten feet. Some of them were charcoal, some putty, and one at 80 feet was a stone cut square, two feet long and about a foot thick, with several characters on it.”

According to this admirably source-heavy webpage, the stone was “yet to be seen in the chimney of an old house near the pit” (19th February 1863, Yarmouth Herald).

Then, the “remarkable” stone was then revealed to have been found “pretty far down in the pit, laying in the centre with the engraved side down”, and the house was revealed to be that of John Smith. It contained “a number of rudely cut letters and figures upon it. They were in hopes the inscription would throw some valuable light on their search, but unfortunately they could not decipher it, as it was either too badly cut or did not appear to be in their own vernacular.” (2nd January 1864, The Colonist, Halifax N.S.)

George Cooke, in a 27th January 1864 letter, described the marks as “rudely cut letters, figures or characters […]. I cannot recollect which, but they appear as if they had been scraped out by a blunt instrument, rather than cut with a sharp one.” He hoped that they could be deciphered in the future.

But what did the marks say? At that point in the cipher’s history, it seems nobody had decrypted it. But, according to the Oak Island Treasure Company prospectus (the copy transcribed on pp.215-225 of Geoff Bath’s “Maps, Mystery, and Interpretation” [Part 2] is dated 1894):

Many years afterwards, it was taken out of the chimney and taken to Halifax to have, if possible, the characters deciphered. One of the experts gave his reading of the inscriptions as follows: “Ten feet below are two million pounds buried.” We give this statement for what it is worth, but by no means claim that this is the correct interpretation. Apart from this however, the fact remains that the history and description of the stone as above given have never been disputed.”

Hence it was (apparently) first decrypted between 1864 and 1894.

Creighton’s Bookstore

The next mention of the “quaint carven stone” has it in Creighton’s Bookstore in Halifax, N.S.: “but the inscriptions were erased long ago after the stone had endured the blows from a bookbinder’s mallet. But at the time of the discovery of the stone the inscriptions were translated to read: ‘Ten feet below, 2,000,000 pounds lie buried.'” (29th April 1909, Fairbanks Daily News Miner).

Yet… the 19th August 1911 edition of Collier’s Magazine contains an eyewitness account supplied by Captain H.L. Bowdoin that departs somewhat from the dominant narrative. He wrote:

“While in Halifax we examined the stone found in the Money Pit, the characters on which were supposed to mean: “Ten feet below two million pounds lie buried.” The rock is of a basalt type hard and fine-grained.”

“There never were any characters on the rock found in the Money Pit. Because: (a) The rock, being hard, they could not wear off. (b) There are a few scratches, etc., made by Creighton’s employees, as they acknowledged, but there is not, and never was, a system of characters carved on the stone.”

There was backed up thoroughly by a 27th March 1935 eyewitness statement by Harry W. Marshall, who was the son of one of the owners of Creighton & Marshalls:

One of the Creighton’s was interested in the Oak Island Treasure Co. and had brought to the city a stone which I well remember seeing as a boy, and until the business was merged in 1919 in the present firm of Phillips & Marshall. The stone was about 2 feet long, 15 inches wide, and 10 inches thick, and weighed about 175 pounds. It had two smooth surfaces, with rough sides with traces of cement attached to them. Tradition said that it had been part of two fireplaces. The corners were not squared but somewhat rounded. The block resembled dark Swedish granite or fine grained porphyry, very hard, and with an olive tinge, and did not resemble any local stone. Tradition said that it had been found originally in the mouth of the “Money Pit”. While in Creighton’s possession some lad had cut his initials ‘J.M.” on one corner, but apart from this there was no evidence of any inscription either cut or painted on the stone. Creighton used the stone for a beating stone and weight. When the business was closed in 1919, Thos. Forhan, since deceased, asked for the stone, the history of which seems to have been generally known. When Marshall left the premises in 1919, the stone was left behind, but Forhan does not seem to have taken it. Search at Forhan’s business premises and residence two years ago disclosed no stone. The full history of the stone was written up in ‘the Suburban” about 1903 or 1904.

(Incidentally, people have searched for this issue of “The Suburban” but without any success.)

The two stones

Nobody seems to have dwelt much on what – to me, at least – is the most obvious problem with the above. Which is that we seem to be talking about two quite different stones here.

The first stone: “a stone cut square, two feet long and about a foot thick”, found eighty feet underground, and put into a chimney. Has curious writing on. Repeatedly described as having been “cut square”, like a “flagstone”.

The second stone: “2 feet long, 15 inches wide, and 10 inches thick”, “rounded” corners, found near the mouth of the Money Pit, and had been taken out of a chimney. Apparently has no writing on. “Basalt type hard and fine-grained” (Bowdoin), or “dark Swedish granite or fine grained porphyry, very hard, and with an olive tinge” (Marshall).

While it is entirely possible that the first stone was cut down to make it fit in John Smith’s chimney, the two descriptions don’t seem to fit each other in any other way either.

The most likely explanation to my mind is that we are talking about two entirely separate rocks both coming from the Money Pit, the first with marks roughly carved into it (and so perhaps a softer stone such as sandstone), and the second a much harder stone with no marks carved into it (the “JM” was added during its time in Halifax).

The first stone may therefore still be extant somewhere, perhaps in the garden of a Halifax house of a former treasure hunter.

Images of the 80-foot rock cipher

The short version is that there are no tracings or copies made from the object itself whose veracity we can be even remotely sure of: most of the images floating round the Internet are mock-ups of what people think it should look like.

Worse, the cipher’s plaintext seems to have changed along the way. Whereas in 1894 it was described as saying “Ten feet below are two million pounds buried”, this later changed into “Forty feet below two million pounds are buried” – note both the different depth and the different word order.

Where did this change? The first time we see the “forty feet” version is in a circa 1949 typewritten account of Oak Island by Reverend Austen Tremaize Kempton (which was never published):

Here’s what it looked like in print (I believe this is in Edward Rowe Snow’s 1949 “True Tales of Buried Treasure”):

The person now often said to have decrypted the inscription was Dalhousie University Professor of Languages James Leitchi: there’s a good-sized page on him here.

Of course, one problem with this is that Leitchi was not actually an “old Irish school Master” but Swiss. However, he was (according to the timeline) a teacher at Halifax High School up until 1884: and we know that the stone was decrypted in Halifax before 1894.

Analyses and theories

Even though there is essentially zero doubt that the cipher as presented by Kempton (and then Snow) does indeed read “Forty feet below two million pounds are buried”, plenty of extra interpretations (typical “dual cipher” theories) have been put forward. One such was Dr. Wilhelm’s (modified) “At eighty guide maize or millet estuary or firth drain F”, described here.

Other webpages suggest that the letter shapes are all mathematical symbols, but this seems a bit lame to me: the shapes are just simple cipher shapes, nothing funky.

Other webpages suggest that Kempton faked the cipher, or that the whole thing is in fact a Masonic cryptogram or riddle. There’s also a theory by Keith Ranville, who also once put forward a Silk Road prostitution theory about the Voynich Manuscript

But I think all these theories and ideas are missing the big problem: which is that because we can’t account for the change in wording between the two versions of the cipher, we simply can’t comfortably trust the versions we have.

However, it’s entirely possible that I’ve missed something important in all the timelines. Please let me know if I have, thanks! 🙂

In a recent post here, I floated the idea that the Zodiac Killer’s Z408 (solved) cipher’s unusual homophone distribution may have arisen not conceptually (i.e. from a hitherto-unknown book on cryptography), but instead empirically (i.e. emerging from the properties of a specific text).

It’s certainly possible that he might have used his own (private) text to model his homophone distribution, in which case we probably almost no chance of reconstructing it. However, I think it likely that he instead used the first few characters of an already existing public text (such as Moby Dick, the Book of Genesis, the Declaration of Independence, or whatever) to do this.

It’s a reasonable enough suggestion, I think: and moreover one that we can try to test to a reasonable degree.

Z408’s homophones

A homophonic cipher key allocates a number of cipher shapes to individual plaintext letters, usually (but not always) in broad proportion to their frequency. So in a typical homophonic cipher key you would expect to see far more shapes for E (the most common letter in English) than for, say, Z or Q.

Though this is essentially the case for what we see in the Z408 cipher (particularly for the more frequent letters, ETAOINS), the numbers of homophones chosen for the less frequent letters seem somewhat idiosyncratic and arbitrary:

7 shapes – E
4 shapes – T A O I N S
3 shapes – L R
2 shapes – D F H
1 shape  – B C G K M P U V W X Y
Did not appear: J Q Z

People have long searched for a primer or textbook on cryptography where the description of the alphabetic frequency distribution matches this, or even where the alphabetic frequency ordering (e.g. ETAOINSHRDLU etc) matches the order here, but in vain.

Designing a filter

The basic idea for the filter is easy enough:
* read in characters from the start of a passage (we’re only interested in capitalized alphabetic letters, i.e. A-Z)
* if the instance count of that character is higher than the top of the desired range, then the test fails
* if the instance counts for all the characters are within the desired range at the same time, then the test passes
* else keep reading in more characters until the test terminates

As a side note: of all the Z408 homophones, only X appears exactly once in the Z408 ciphertext itself: but while it is conceivable that the Zodiac Killer might have allocated extra homophones for X, it does seem fairly unlikely.

The desired ranges for each of the characters would look like this (though feel free to adapt this if you disagree with the homophone counts listed above):

[7,7] – E
[4,4] – T A O I N S
[3,3] – L R
[2,2] – D F H
[0,1] – B C G K M P U V W Y J Q Z
[0,3] – X (to err on the side of safety)

Note that the single-letter characters have a slightly broader [0,1] range because we have no way of knowing whether or not they would have actually appeared in the original text.

Here are two test texts that should both pass:



Which texts to try?

Though any text published before August 1969 would potentially be a match, it would make sense to look at all manner of texts, and possibly even the first few lines of different chapters of books (though I’d be a little surprised if that was the case). All the same, the filter is easy enough to write (and should execute in a matter of microseconds) and to test, so the difficulty here lies mostly in getting hold of enough texts to try, rather than the compute time as such.

Oddly, I don’t really have a solid feel for how often the filter will find a match: my gut instinct is that roughly one in a million English text comparisons will pass, but that’s just a guesstimate based on each letter having its own little bell-curve distribution, all of which have to match at the same time.

So what do you think will match? “Catcher in the Rye” or “Moby Dick”? Place your bets! 😉

Though originally published in 1998 and 2003, and most recently published in three volumes in 2013-2014, “Maps, Mystery and Interpretation” is in reality a single (very large) book, the fruits of Geoff Bath’s vast sustained effort to till Oak Island’s unproductive historical soil.

The overall title broadly suggests its three constituent sections, in that Part 1 covers (possibly pirate) treasure maps (“Maps”); Part 2 examines the evidential haze surrounding the Oak Island “Money Pit” mystery (“Mystery”); while Part 3 attempts to put the myriad of pieces together to make sense of them all (“Interpretation”). Simples.

If only the Oak Island mystery itself were as straightforward…

Part 1: Maps

Here, Geoff presents all the “Kidd” maps that Hubert Palmer ended up with, and compares Howlett’s account of them with Wilkins’ account, as well as – and this is the good bit – lots of letters written and received by both Wilkins and Palmer.

I can’t be the only reader to find himself or herself surprised by Bath’s conclusion – that Wilkins essentially got it all just about right, while Howlett got a great deal of it wrong.

All the same, as far as reconstructing the modern history of the Palmer-Kidd maps goes, Geoff’s reasoning here seems very much on the money. I’d say his account gets far closer to what happened than even George Edmunds’ account (stripping both authors’ conclusions out of the picture first).

However, Bath gets himself in something of a tangle trying to make sense of the various maps Wilkins originated (both in Part 1 and in Part 3). Was Wilkins adapting maps or documents otherwise unseen, using them as templates for his own creations, or trolling his readers to help him identify mysterious islands? Too often Bath seems content to speculate in a way that paints Wilkins in an almost Svengali-like way, a kind of Andy Warhol of treasure maps.

In reality, I’m far from sure that Wilkins was any closer to historical clarity than we are now. Given that I can’t read more than a handful of pages of his “A Modern Treasure Hunter” without feeling nauseous (the fumes! the bad accents! the ghosts!), I just can’t see Wilkins as anything like a consistently reliable source, even about himself.

Yet one of the most specifically insightful things that emerges from Part One is Bath’s observation that it isn’t necessary for these maps to actually be Kidd’s for them to be independently genuine. That is, the set of maps’ whole association with Kidd might be something that was overlaid onto a (non-Kidd) set of maps: the supposed Kidd link might easily have been added to the mix as a way of “bigging up” someone else’s maps. If this is true (and you don’t have to believe that these are Oak Island maps for it to be so), many of the difficulties that arise when you try to link them to Kidd (e.g. dating, language, etc) disappear.

It’s still hellishly difficult to make sense of these maps, for sure, but Geoff is right to point out that Kidd may well turn out to be part of the problem here, rather than part of the solution or explanation. Something to think about, for certain.

Part 2: Mystery

In my opinion, Oak Island is a wretched, wretched subject, filled with all the slugs and snails of cipher mysteries and not the vaguest flicker of any of the good stuff. It’s a bleak, barren evidential landscape, filled with unconfirmed micro-features briefly noted by a long series of individual investigators, before being quickly razed from the face of the earth by gung-ho treasure hunters. There seems little genuine hope that any faint trace of anything historical or sensible still remains.

Putting the speculative sacred geometry and shapes picked on maps to one side, there are some (though not many) good things in Part Two I didn’t previously know about. Specifically, the idea that tunnels and features might have been dug aligned with the local magnetic compass at that time is quite cool, though obviously something that has been much discussed over the decades.

So I’m terribly sad to have to say that even a perceptive and diligent researcher such as Geoff Bath can make no real difference to this long-standing disaster area. His Part 2 is therefore little more than a Ozymandian monument to the effort and greed sunk in the pursuit of the Money Pit (not that a brass farthing or even so much as a period button has come of it to date).

Nothing beside remains. Round the decay
Of that colossal wreck, boundless and bare
The lone and level sands stretch far away

Part 3: Interpretation

Having struggled through the unpromising desert of the previous part, my expectations as to what Part 3 might bring were fairly low. But as Bath works his way through his interpretation section (repeatedly railing against the pox of untestable hypotheses), something actually rather odd happens.

All of a sudden, he mentions the Venatores (a early 20th century treasure hunting group) and the Particulars (a set of treasure hunting documents collected together by the Venatores). As this enters the picture, it’s as if a curious wave ripples through the whole research fabric: that, contrary to what you might have thought from the two previous books, it’s all not about whether Wilkins was credible or incredible, or whether Hill Cutler was stone cold serious or laughing all the way to the Terminus Road Lloyds Bank in Eastbourne, but instead that there might actually be something behind it all.

That is to say, what emerges – though all too briefly – is a frisson of that wonderfully engaging secret history paranoia where you can just sense stuff going on behind the scenes but which you know you probably won’t ever gain access to.

In the end, Bath’s well-researched and well-written books didn’t manage to persuade me of the existence of a link between the various treasure maps and the Oak Island mystery (and that, indeed, is a hypothesis that would seem to be politically untestable) nor of any kind of geometric cartography plan driving it all. However, it did manage to convince me that the whole Money Pit enterprise might possibly be built not on a vast hole, but instead on a history whose fragmentary parts have been scattered on the winds, and yet which might possibly be reassembled in the future.

It probably won’t happen but… who can say?

A little while back, I had a email from Marie about Alexander d’Agapeyeff’s (1939) book “Codes and Ciphers”, highlighting some interesting mistakes she had found in his section on double transposition cipher.

D’Agapeyeff described this as a cipher system that the Russian Nihilists had used, but said that they had used the same keyword for both halves of the transposition (i.e. for transposing both the columns and the rows), a technical flaw that made it easy to crack. (Oddly, the Nihilists are nowadays associated with an entirely different kind of encipherment.)

Let’s take a closer look…

D’Agapeyeff’s Double Transposition

What follows is d’Agapeyeff’s account, with comments along the way.

At the end of the nineteenth century the Russian Nihilists used a double cipher, which, having been transposed vertically, was then transposed horizontally; but they made the mistake of using the same keyword in both transpositions. As it is a common variation of double columnar cipher, we give it as an example:

The first thing that Marie picked up on was that the way that d’Agapeyeff converted the transposition keyword SCHUVALOF to an ordering was clearly incorrect: F is the sixth letter of the alphabet, so there is no obvious way that it would be counted as the highest ranked of the nine letters in the keyword. When I looked at this, I immediately guessed that it should instead have read SCHUVALOV – as it turned out, this was a good try, though still very slightly wrong. 😐

Regardless, it should already be clear that something a little non-obvious is going on here.

Now suppose we have to encipher the following: ‘Reunion to-morrow at three p.m. Bring arms as we shall attempt to bomb the railway station. Chief.’

The ‘abcd’ at the end are ‘nulls’ used to fill in the squares.

Now we transpose the message according to the letter sequence of the keyword:

So the message reads:


In all languages where certain letters must follow or precede certain others, the deciphering of this script will never present difficulties. We first count the number of letters in the script (81), which will give us the size of the square (9×9), and once this is done all we have to do is remember that in nine cases out of ten ‘h’ follows either ‘t’ or ‘s’ or ‘c’, and that the bigrams such as AT, TO, WE and the very helpful (English) trigram ‘the’, and the doubles TT, LL, EE, etc., are the most common. In fact, the Russian police soon found out all about that conspiracy.

The second thing Marie noted here was that d’Agapeyeff was using the double transposition decryption direction here, rather than the encryption direction.

All in all, I’d agree with Marie that d’Agapeyeff didn’t seem to have fully understood how the system worked. Smartly, though, Marie now doggedly decided to look at d’Agapeyeff’s crypto sources, to see if he had copied this whole section blindly from somewhere. And, eventually, she found that d’Agapeyeff’s direct source for the above was none other than…

Auguste Kerckhoffs

…the Dutch cryptographer Auguste Kerckhoffs (1835-1903).

Kerckhoffs’ influential book (well, extended article, really) “La Cryptographie Militaire” is available online as a PDF, or as an HTMLized version here.

What follows is my usual free translation of Kerckhoffs’ description of double transposition, which we can immediately see beyond any reasonable doubt as being the source for d’Agapeyeff’s version:

On the occasion of the Nihilists’ last appearance in court, the Russian newspapers published the accused’s secret cipher. It is a system of double transposition, where the letters are first transposed by vertical columns, and are then further transposed by horizontal rows. The same word serves as a key for both transpositions: to do this, the keyword is transformed into a series of numbers, where each number matches the rank of the letter within the normal alphabetical sequence.

Here is the process applied to the word SCHUVALOW:

OK, though I was on this occasion very slightly wrong (SCHUVALOV rather than SCHUVALOW), I was at least wrong in the right kind of way. 🙂 Kerckhoffs continues:

Now, if we were to transpose a sentence such as this one – Vous êtes invité à vous trouver ce soir, à onze heures précises, au local habituel de nos réunions – we would proceed first as in the previously described [single transposition] case, and then carry out the same operation for the horizontal rows.

   = s c i a u e s e l a v i v o n t e u v t r e r s o u c a c a b i o l h t n e l o s u d e r, etc.

However complicated this transposition may appear to us, deciphering a cryptogram written with this system, can never present insurmountable difficulties in languages ​​where certain letters only present themselves in particular combinations, such as q or x in French. Here, the Russian decipherers seem to have carried out their decryption work in a relatively short time.

For any passing conlang fans, Auguste Kerckhoffs was also closely associated with the artificial language Volapük, which some people think is really koldälik. 🙂

d’Agapeyeff + Kerckhoffs = …?

It’s important to remember that d’Agapeyeff wasn’t himself a cryptographer, but rather someone who was trying to collect together interesting crypto stuff into a book that had originally been commissioned for someone else entirely to write. The project wasn’t something he was aiming to do, but rather something that fell in his lap.

As Marie points out, the big technical thing that d’Agapeyeff got wrong is that the numbers are the wrong way round, and so he is performing a double transposition decryption rather than a double transposition encryption: the two are not the same at all. That is, if you used SCHUVALOW as your single transposition keyword and then single transposition encrypted the text “SCHUVALOW”, you should get the ciphertext “ACHLOSUVW”: but both Kerckhoffs and d’Agapeyeff (copying Kerckhoffs) seem to have got this the wrong way round.

Having thought about this for a little while, I’ve come to suspect that d’Agapeyeff may well have faultily believed that double transposition was a self-inverse process, i.e. where the decryption and encryption transformations are identical.

All of which would dovetail very neatly indeed with the report that we have that he was unable to decrypt his own challenge cipher: for if he (wrongly) believed that double transposition was self-inverse, then he wouldn’t (if his challenge cipher had used double transposition) have been able to decrypt it at all. If this is correct, then his failure wasn’t anything as foolish as misremembering the keyword, but instead misunderstanding one of the component ciphers that made up the overall chain.

Might this insight help us decrypt his challenge cipher? Well… insofar as it now seems far more likely to me that he used double transposition as one of his stages, then the answer may very well be yes. Hopefully we shall see… 🙂

Prolific (if occasionally prolix) Cipher Mysteries commenter bdid1dr has long wondered whether the Somerton Man was someone in her ex-husband’s family. (She also suspects her ex-husband was the infamous Zodiac Killer, but let’s leave that for another day.)

Even though it at first sounds like an outrageously long shot (and one that would perhaps necessitate a Warren Commission ‘magic bullet’), it does in fact concord with many of the things we know about the Somerton Man, in perhaps surprising ways.

For a start, the aluminum comb, the packet of Juicy Fruit chewing gum found in the Somerton Man’s American-stitched coat and indeed the coat itself have all been taken as suggesting that the Somerton Man was American (or had recently travelled from America).

More specifically, Derek Abbott launched his recent (but unsuccessful) crowdfunding campaign on the back of a fragmentary DNA match between one of the hairs found embedded in the plaster cast bust of the Somerton Man and Thomas Jefferson.

Yet it turns out that the Shackelfords are an old Virginian family… with links to Thomas Jefferson. OK, this is all still very far from proof, but we’re not yet veering into anything like the canonical Lands Of Somerton Nonsense: so please bear with me just a little longer as we take a look at the Shackelfords…

Lee Erwin Shackelford

According to the Sydney Morning Herald, he was born on 12th April 1945 to Willian Shackelford and Normaleen (nee Park):

SHACKELFORD (nee Normaleen Park). April 12, King George V Hospital, Camperdown, semi-private, wife of T./Sgt. W. Shackelford, U.S. Air Corps – a son (both well)

And thanks to a little archival magic (big tips of the Cipher Mysteries hat to Eye and Aye for this), we have a photo of Lee Erwin Shackelford from the USS Ticonderoga circa 1964:

He was also bdid1dr’s first husband: she says that he died in New York a few years ago.

He had a brother (Preston Park Shackelford) who was born 10th April 1948 in Vallejo CA: and another brother (Mark) who was born in New Mexico in 1952.

William Jesse Shackelford Jr

Eye and Aye came up trumps here as well, with William Jesse Shackelford Jr’s US Armed Forces registration card (note: image behind a Fold3 paywall). According to this, he was born on 17th May 1922 in Norfolk VA: the “Name And Address Of Person Who Will Always Know Your Address” field is marked up as “Mrs A. B. Shackelford, 1631 Willoughby Ave, Norfolk, VA”. (Willoughby Ave is close to Norfolk’s Lyon Shipyard: #1631’s plot was long since sacrificed to make way for the I-264.)

According to the Registrar’s Report (note: image also behind a Fold3 paywall), William Shackelford Jr was white, 5′ 5″, 125 lbs, hazel eyes, brown hair, and with a ruddy complexion. He received his honourable discharge from the Army on the 30th August 1945 (ref: 13-062-516).

Unless he secretly had access to a Tardis, William Shackelford was not the Somerton Man: he was still very much alive in 1950, 1960, and even 1970.

Misca pointed out that:

On ancestry there is a record of a Normaleen May Shackelford travelling from Brisbane to San Francisco with her son Lee Ervin/Erwin. The name of the friend/relative she states she is visiting is William Shackelford, 835 Oaklette Avenue, Norfolk, Virginia. A 1940 census document shows two William J Shackelfords living on Oaklette. One is 39 and the other is 17. Father and son. Further research shows the son as having been in the US Airforce in WW II. He is William Jesse Shackelford. He married three times. First wife unknown but I suspect it may have been Normaleen. Second wife (married in 1957) Leila Barnes Stewart (who seems to have died), third wife Catherine Anne Garrett.

William Jesse Shackelford Sr

William Jesse Shackelford Sr’s obituary (in the 7th December 1972 Virginia Beach Sun looked like this:

William Jesse Shackelford, 73, of 292 Stancil St., Princess Anne Plaza, an insurance agancy [sic] operator, died in a hospital November 28 after a long illness.
He was a native of Walter Valley, Tex., a son of William J. and Mrs Martha Farley Shackelford, and the husband of the late Mrs Josephine Taylor Shackelford.
He was the owner of William J. Shackelford Insurance Co. He was a member of Norfolk Elks Lodge 38, American Legion, and Commodore FOP Lodge 3.
He was a World War I veteran.
Surviving are two daughters, Mrs Bennie S. Jordan of McLean and Mrs Shirley S. Becker of Virginia Beach; a son, William Jesse Shackelford Jr of Alexandria; two sisters, Mrs Cordelia Willcox of Tuolumne, Calif., and Mrs Sylvia S. Snyde of Corpus Christi, Tex.; a brother, Feilx Shackelford of Odessa, Tex.; 11 grandchildren; and 11 great grandchildren.

Might there be a missing Shackelford…?

I hope it’s not construed as unkind of me to note that bdi1d1dr’s handed-down family stories don’t quite add up. At this remove in both time and space, tales about her ex-husband’s family’s life in Australia (he moved to the US at a very young age) are bound to be fragmentary and incomplete.

What is either interesting or just plain Chinese Whispered here is that she was sure that there was also a Lee Irving Shackelford in Australia, who somehow disappeared: and quite how he fits into the whole picture nobody seems to know or remember.

And so my challenge to you fine people is to find out if there was a disappearing relative in William Jesse Shackelford (Jr or Sr)’s immediate family tree. Oh, and who was “Mrs A. B. Shackelford”?

Incidentally, one unusual (but possibly useful) resource here is the Shackelford Clan, a group that published a family history newsletter from May 1945 to April 1957 (scanned issues are listed online here) researching… the history of the Shackelford family. Good hunting! 🙂

Well, here’s a thing. The Thirteenth Oxford Medieval Graduate Conference, to be held in a month’s time at Merton College (31st March 2017 to 1st April 2017) on the theme of “Time : Aspects and Approaches”, has a Voynich-themed paper in its Manuscripts and Archives session on the second day (11:30am to 1:00pm).

This is “Asphalt and Bitumen, Sodom and Gomorrah: Placing Yale’s Voynich Manuscript on the Herbal Timeline“, presented by Alexandra Marraccini of the University of Chicago. The description runs like this:

Yale Beinecke MS 408, colloquially known as the Voynich manuscript, is largely untouched by modern manuscript scholars. Written in an unreadable cipher or language, and of Italianate origin, but also dated to Rudolphine court circles, the manuscript is often treated as a scholarly pariah. This paper attempts to give the Voynich manuscript context for serious iconographic debate using a case study of Salernian and Pseudo- Apuleian herbals and their stemmae. Treating images of the flattened cities of Sodom and Gommorah from Vatican Chig. F VII 158, BL Sloane 4016, and several other exempla from the Bodleian and beyond, this essays situates the Voynich iconography, both in otherwise unidentified foldouts and in the manuscript’s explicitly plant-based portion, within the tradition of Northern Italian herbals of the 14th-15th centuries, which also had strong alchemical and astrological ties. In anchoring the Voynich images to the dateable and traceable herbal manuscript timeline, this paper attempts to re-situate the manuscript as approachable in a truly scholarly context, and to re-characterise it, no longer as an ahistorical artefact, but as an object rooted in a pictorial tradition tied to a particular place and time.

BL Sloane 4016 is a similar-looking herbal that Voynich researchers know well. Most famously, Alan Touwaide wrote a 500-page scholarly commentary on it (as mentioned in Rene’s summary of Touwaide’s chapter in the recent Yale facsimile). It dates to the 1440s in Lombardy, and even has a frog (‘rana’) on folio 81:

Marracini herself is an art historian who previously graduated from Yale, and who has an almost impossibly perfect set of research interests:

Her research focuses on Late Medieval and Early Modern scientific images, particularly alchemical and medical material, in England, Scotland, Germany, and the Netherlands. Her interests in the field also include book history and manuscript studies, Late Antique material culture, and the historiography of art, particularly in Warburgian contexts. Currently, she is writing on the history of Hermetic-scientific images and diagrams, and her work on Elias Ashmole’s copies of the Ripley Scrolls is forthcoming in the journal Abraxas.

All of which looks almost too good to be true. It’s just a shame her presentation falls on April Fool’s Day, so we’re bound to have people claiming that she doesn’t really exist and it’s all a conspiracy etc. 😉

A few days ago, Australian robotics hacker Marcel Varallo (whose gladiatorial hacks making Roombas fight each other amuse me greatly) very kindly posted up two new scans of the Somerton Man’s Rubaiyat code (along with many megs of his collected Somerton Man stuff) on his blog.

I’ve put the three scans we now have on a Cipher Foundation Rubaiyat Code page, and strongly recommend that people use one of the new scans as a basis for doing any image processing work, rather than the one that has been on the Internet for years.

For example, if you put the three scans’ “Q” shapes side by side and try doing image processing experiments on them…

…what you find is that the so-called “microwriting” (found in the leftmost of the three images) was simply a quantizing artefact introduced when the original JPEG image had its brightness and contrast adjusted. With the new (slightly higher resolution, and generally much smoother) scan, all that nonsense disappears. There is no ‘microwriting’ there at all: The End.

Voynich researchers without a significant maths grounding are often intimidated by the concept of entropy. But all it is is an aggregate measure of how [in]effectively you can predict the next token in a sequence, given a preceding context of a certain size. The more predictable tokens are (on average), the smaller the entropy: the more unpredictable they are, the larger the entropy.

For example, if the first order (i.e. no context at all) entropy measurement of a certain text was 3.0 bits, then it would have almost exactly the same average information content-ness per character as a random series of eight different digits (e.g. 1-8). This is because entropy is a log2 value, and log2(8) = 3. (Of course, what is usually the case is that some letters are more frequent than others: but entropy is the bottom line figure averaged out over the whole text you’re interested in.)

And the same goes for second order entropy, with the only difference being that because we always know there what the preceding letter or token was, we can make a more effective guess as to what the next letter or token will be. For example, if we know the previous English letter was ‘q’, then there is a very high chance that the next letter will be ‘u’, and a far lower chance that the next letter will be, say, ‘k’. (Unless it just happens to be a text about the current Mayor of London with all the spaces removed.)

And so it should proceed beyond that: the longer the preceding context, the more effectively you should be to predict the next letter, and so the lower the entropy value.

As always, there are practical difficulties to consider (e.g. what to do across page boundaries, how to handle free-standing labels, whether to filter out key-like sequences, etc) in order to normalize the sequence you’re working with, but that’s basically as far as you can go with the concept of entropy without having to define the maths behind it a little more formally.

Voynich Entropy

However, even a moment’s thought should be sufficient to throw up the flaw in using entropy as a mathematical torch to try to cast light on the Voynich Manuscript’s “Voynichese” text… that because we don’t yet know what makes up a single token, we don’t know whether or not the entropy values we get are telling us anything interesting.

EVA transcriptions are closer to stroke based than to glyph based: so it makes little (or indeed no) sense to calculate entropy values for EVA. And as for people who claim to be able to read EVA off the page as, say, mirrored Hebrew… I don’t think so. :-/

But what is the correct mapping or grouping for EVA, i.e. the set of rules you should apply to EVA to turn it into the set of tokens that will give us genuine results? Nobody knows. And, oddly, nobody seems to be even asking any more. Which doesn’t bode well.

All the same, entropy does sometimes yield us interesting glimpses inside the Voynichese engine. For example, looking at the Currier A pages only in the Takahashi transcription and using ch/sh/cth/ckh/cfh/cph as tokens (which is a pretty basic glyphifying starting point), you get [“h1” = first order entropy, “h2” = second order entropy]:

63667 input tokens, 56222 output tokens, h1 = 4.95, h2 = 4.03

This has a first order information content of 56222 x 4.95 = 278299 bits, and a second order information content of (56222-1) x 4.03 = 226571 bits.

If you then also replace all the occurrences of ain/aiin/aiiin/oin/oiin/oiiin with their own tokens, you get:

63667 input tokens, 51562 output tokens, h1 = 5.21, h2 = 4.01

This has a first order information content of 51562 x 5.21 = 268638 bits, and a second order information content of (51562-1) x 4.01 = 206760 bits. What is interesting here is that even though the h1 value increases a fair bit (as you’d expect from extending the post-parsed alphabet with additional tokens), the h2 value decreases very slightly, which I find a bit surprising.

And if, continuing in this vein, you also convert air/aiir/aiiir/sain/saiin/saiiin/dain/daiin/daiiin to glyphs, you get:

63667 input tokens, 50387 output tokens, h1 = 5.49, h2 = 4.04

This has a first order information content of 50387 x 5.49 = 276625 bits, and a second order information content of (50387-1) x 4.04 = 203559 bits. Again what I find interesting is that once again the h1 value increases a fair bit, but the h2 value barely moves.

And so it does seem to me that Voynich entropy may yet prove to be a useful tool in determining what is going on with all the different possible parsings. For example, I do wonder if there might be a practical way of exhaustively / hillclimbingly determining the particular parsing / grouping that maximises the post-parsed h1:h2 ratio for Voynichese. I don’t believe anyone has yet succeeded in doing this, so there may be plenty of room for good new work here – just a thought! 🙂

Voynich Parsing

To me, the confounding beauty of Voynichese is that all the while we cannot even parse it into tokens, the vast modern cryptological toolbox normally at our disposal does us no good.

Even so, it’s obvious (I think) that ch and sh are both tokens: this is largely because EVA was designed to be able to cope with strikethrough gallows characters (e.g. cth, ckh etc) without multiplying the number of glyphs excessively.

However, if you ask whether or not qo, ee, eee, ii, iii, dy, etc should be treated as tokens, you’ll get a wide range of responses. And as for ar, or, al, ol, am etc, you won’t get a typical linguistic researcher to throw away their precious vowel to gain a token, but it wouldn’t surprise me if they were wrong there.

The Language Gap

The Voynich Manuscript throws into sharp relief a shortcoming of our statistical toolbox: specifically, its excessive reliance on our having previously modelled the text stream accurately and reliably.

But if the first giant hurdle we face is parsing it, what kind of conceptual or technical tools should we be using to do this? And on an even more basic level, what kind of language should we as researchers use to try to collaborate on toppling this first statue? As problems go, this is a precursor both to cryptology and to linguistic analysis.

As far as cipher people and linguist people go: in general, both groups usually assume (wrongly) that all the heavy lifting has been done by the time they get a transcription in their hands. But I think there is ample reason to conclude that we’re not yet in the cinema, but are still stuck in the foyer, all the while there is a world of difference between a stroke transcription and a parsed transcription that few seem comfortable to acknowledge.

Given that the Zodiac Killer’s first big cipher (the Z408) got cracked so quickly, it shouldn’t really be a surprise that he used a slightly different system for his second big cipher (the Z340). What is (arguably) surprising is that whatever change he made to it has not been figured out since then.

But what was he thinking? What did he want from a cipher? And how might his needs have changed between Z408 and Z340?

The Z408

Ciphers are normally made to be as strong as practically possible, given the technological, time, and resource constraints that apply to both sender and receiver: and with the two main driving needs being privacy and secrecy. Note that these aren’t always the same thing: the way I usually describe it is that while sex with your husband is private, sex with your tennis coach is secret. 😉

And so the first thing I find cryptographically interesting about the Zodiac Killer is that he was creating a cipher from a slightly angle from either of these: and he certainly wasn’t trying to communicate in any normal sense of the word.

Rather, I think that the point of Z408 was to be taunting, and to demonstrate to the police that he was in control, not them.

So imagine the Zodiac’s probable fury, then, when little more than a week after his three Z408 cryptograms appeared in local newspapers (the Vallejo Times-Chronicle, the San Francisco Examiner and the San Francisco Chronicle), Donald and Bettye Harden were all over the front pages explaining how they had cracked them.

Didn’t they know who was supposed to be in control here?

What was worse, the Hardens hadn’t used cryptological hardware or even high-powered cryptological smarts. They’d just used the Zodiac’s egoism (they guessed the first letter was “I”) and his psychopathic bragging (they guessed he would use the word KILL multiple times) as keys to his cryptographic front door: and then marched straight in.

I think it’s fairly safe to expect that the Zodiac was pretty pissed off by this.

Note that the Hardens carried on trying to crack the Z340 for many years afterwards: according to their daughter, her “mother wrote poetry and was as absorbed in her writing as she became with the Zodiac codes. She worked on the second code on and off for the rest of her life.

The Z340

Comparing the overall style of the Z340 with that of the Z408, there seems to be plenty of reasons to think that the two are, at heart, not wildly different from each other. And yet (as is widely known) all the big-brained homophonic solvers written since haven’t made any impact on the Z340 at all.

All the same, I think the second interesting thing to note is that the changes to the Z340 system were surely not made to defend against computer-assisted codebreaking (because that hadn’t yet happened), but rather to make the updated system Harden-hardened, so to speak.

What does this mean? Well, we can probably infer that the first letter of the Z340 is almost certainly not I (not that that helps us a great deal) and the Zodiac Killer must have done something to conceal or remove the KILL weakness.

But, in my opinion, that latter change would surely not have been a theoretically-motivated cryptographic adaptation (he was without much doubt an amateur cryptographer), but rather something pragmatic and empirical, perhaps along the lines of:
* adding a repeat-the-last-letter token
* add an LL token
* add an ILL token
* add nulls inside tell-tale words
* etc

But there’s a problem with all of these. In fact, there are several problems. 🙁

The Problems

The first problem is that I don’t currently believe any of the above changes are disruptive enough to explain what we see in the Z340.

The basic stats of the four main Zxxx ciphers are:
Z408: 408 symbols, from a set of 54 unique symbols. (Note: E has 7 homophones, AST have 6 each, IO have 5 each, N has 4, FLR have 3 each, DHW have 2 each, everything else has 1).
Z340: 340 symbols, from a set of 63. [Hence symbols/textsize is 18.5%, a fair bit higher than the Z408’s 13.3%]
Z32: 32 symbols, from a set of 30.
Z13: 13 symbols, from a set of 8.

It would be very tempting to suspect (as many people have) that the Z340 is ‘therefore’ just the same as Z408 but with 39% more homophones. Yet a problem with this popular hypothesis is that it should be well within range of automated homophone solvers, and to date they haven’t managed to make any impact.

A second problem is that the kind of homophone cycles that so characterized the Z408 seem to be largely absent in the Z340: and yet because the Zodiac Killer would not have had any clue that these were a technical weakness of his system, it seems unlikely to me that he would have adjusted his system to work around a weakness that he didn’t actually know was a weakness.

A third problem is that the Z340 has a fair number of asymmetries that don’t fit the it’s-a-straight-homophonic-cipher model. For example, lines 1-3 and 11-13 have (as Dan Olson pointed out some years ago) almost no character repeats.

There are yet other asymmetries: for example, while 63 different symbols appear in the top ten lines, only 60 appear in the bottom ten lines. And there’s the mysterious ‘-‘ shape at the start and end of line 10: and the odd-looking “ZODAIK” sequence on line 20.

One final asymmetry: the ‘+’ shape seems to function differently in the top and bottom halves – it is often preceded by ‘M’ in the top half, but never preceded by ‘M’ in the bottom half.

How does assuming the Z340 is a pure homophonic cipher explain any of these behaviours, let alone all of them?

Lines 1-3 and 11-13, revisited

I keep coming back to the 1-3 and 11-13 property as mentioned here. I think it’s important to say that Dan Olson’s conclusion (that “lines 1-3 and 11-13 contain valid ciphertext whereas lines 4-6 and 14-16 may be fake”) seems likely to be landing a little bit wide of the mark.

To me, this same property of these lines implies (a) that the homophonic versions for each letter were probably used in pure sequence here, but also (b) the homophone cycles were somehow ‘reset’ after ten lines (i.e. the homophone cycles all started again at the start of line eleven). And perhaps also that any characters repeated in the first three lines are rarer characters, rather than the homophone-friendly ETAOINSHRDLU etc.

It might even be that the Zodiac Killer kept on adding homophones as he constructed the cipher UNTIL he had three lines’ worth of essentially unique homophones: that is to say, that the three line blocks in 1-3 and 11-13 are how his system made the choice of the number of homophones, rather than as a consequence of the number of homophones he chose. Nobody has yet (to my knowledge) satisfactorily explained where he came up with his homophonic allocation for Z408: certainly, searching for this in crypto books hasn’t yielded any likely candidates.

Could it be that the Zodiac Killer worked backwards from his actual Z408 ciphertext to determine the number of homophones, rather than worked forward from the number of homophones to the ciphertext?

Update: I received the following off-line comment from David Oranchak, but thought it better to update it within the post itself…

Nick, there are a few other seemingly rare phenomena that can be observed in Z340. I’m curious what you think of them.

The first is the pivots:

Those kinds of patterns are difficult to arise by chance, so they are suspected to be some sort of feature of the encoding scheme.

Z408 is littered with repeating bigrams but Z340 seems to have fewer than would be expected via normal homophonic encipherment of a plaintext in a normal reading direction. However, the bigrams show up again if you consider a periodic operation on the cipher text:

The count of 25 repeating bigrams jumps to 37 or 41 or even higher, depending on the periodic operation applied to the cipher text. Here is a tool that illustrates the various operations:

You’ve already identified the seemingly rare phenomenon of rows that lack repeating symbols. There are 9 such rows. In 1,000,000 random shuffles of Z340, none had that many rows. In fact, the best that was found was 8 rows which occurred in only 12 of the shuffles.

Your “M+” asymmetry observation seems to fit in with the general observation that repeating bigrams are phobic of certain regions of the text. The lower left, for instance, seems to hate bigrams:

Another really strange observation is the distribution of non-repeating string lengths. For each position of Z340, measure how far you can read forward without encountering a repeating symbol. You end up with a string with unique sequences of length L. Jarlve found that for Z340, there is a peak of 26 occurrences of unique sequences of length 17 (which happens to be the width of Z340). It is really interesting that in random shuffles, this phenomenon is only observed on the order of one in a billion shuffles.

Finally, I would recommend that anyone interested in this topic should check out this thread on morf’s Zodiac forum: Especially the more recent posts on the latter pages. “Jarlve” and “smokie” in particular are doing fantastic work exploring various transcription schemes that could explain the various curious features of Z340 (in particular, the relationships between periodic bigrams and transposition schemes).

In some ways, it’s the shortest of distances from [Ethel Voynich] to [Ethel Merman], so why not “Voynich, The Musical“? Close your eyes, imagine a Broadway stage, take out a mortgage to get yourself a semi-affordable seat, spill a drink on your leg, and you’re as good as there…


Act One, Scene One

It’s 1912. A single spotlight illuminates an old trunk in the middle of an otherwise empty wooden stage: there’s dust in the air. We hear slow, sustained violins off-stage, harbingers of the big discovery that is about to happen.

WILFRID appears stage right. He is well dressed (though a little tweedy for our modern tastes), and wears small round glasses. He looks in the prime of his life – there’s a vigour and physical excitement to him. He approaches the trunk, opens it, takes out an old book and peers inside it. As his eyes grow ever wider, the violins swell, and he sings his first number “Friends To The End”.


This never happened – I wasn’t here.
There was never a trunk (that was junk), isn’t this queer?
I conjured a castle, to hide Jesuit lies…
While the customer’s king, I’ll say anything (however unwise).

[Chorus] But you, you were always real
Even if you made me feel
Like an antiquarian schlemiel –
I couldn’t comprehend.
But I knew, I knew when I met
My ugly duckling Juliet
With your strange alphabet
We’d be friends to the end…
Friends to the end.

Act One, Scene Two

Back in London, WILFRID hesitantly shows his newly-acquired manuscript to his wife ETHEL: he thinks it’s going to make them rich. However, ETHEL cannot believe that he has wasted money on something as unbelievably stupid as a book that nobody can read. To make her feelings on the matter completely clear, she sings her angry opening number “Down the drain”.


Little naked women
Standing round or swimming
What is this you’re bringing
To our house?
You can’t read a word of it
Written by a heretic
I can’t see the benefit
To man or mouse

[Chorus] You put good money / Down the drain
Buying enciphered / Castles in Spain
Were those nymphs fogging / Your revolutionary brain?
Or has their writing sent you / Completely insane?

Act One, Scene Three

WILFRID has moved to New York, and is trying (unsuccessfully) to convince wealthy American collectors to buy his unreadable manuscript. Though his sales patter normally charms the birds down off the trees, he’s finding it difficult to find anyone with any affinity for this unusual artefact. His song “It’s No Use” documents his ongoing struggle.


There’s jazz and money in the air
The excitement of a New World at play
New rules, new wealth, new clothes, new hair
America strides into a brand new day

You, sir, with your spats and suits
Your garden parties and Egyptiana
Might I interest you in this book’s strange roots
And its hard-to-pin-down flora and fauna?

[Chorus] It’s no use
My duckling’s no swan
I’ve cooked my goose
My big chance has gone
I’ll find no willing
Who’ll pay more than a shilling
They’re too mercantile

Act One, Scene Four

It’s 1930 in New York. WILFRID is dying, having never been able to sell his “Roger Bacon” manuscript. ETHEL brings his beloved manuscript to him, so that he can see it one last time. WILFRID sings a song to both of them: “It’s Time To Say Goodbye”.


Perhaps I was wrong / To hope for the best
To follow every wastrel clue / Like a man possessed
Why can’t anybody else / See what I see?
Are they put off by mere / Indecipherability?

[Chorus] It’s time to say goodbye
To the woman I have loved
And greet the naked angels
Hovering above
I’ve seen them for years
Sitting on my shelves
Filling every page of
Quires eleven and twelve