No.

You might instead ask: “Was the author of the Voynich Manuscript a nymphomaniac lesbian from Baden Baden obsessed with clysters?”

Or how about: “Was the author of the Voynich Manuscript a medieval psychoactive drugs harvester from (the place now known as) Milton Keynes?”

Or: “Was the author of the Voynich Manuscript a Somalian Humiliatus obsessed with mis-shapen vegetables starting with the letter ‘A’, writing down the results of a six-year-long trek through the Amazon rainforest in a perversely private language?”

The answers to these are, errrm, no, no, and no (respectively).

When the Voynich Manuscript contains so many unexplained points of data (a thousand? Ten thousand?), why on earth should I or anyone else spend more than a minimal amount of time evaluating a Voynich theory that seems to attempt to join together just two of them with what can only be described as the flimsiest of thread?

What – a – waste – of – time – that – would – be.

I’ve just uploaded a draft paper to academia.edu called Fifteenth Century Cryptography Revisited. This takes a fresh look at the topic (specifically at homophonic ciphers, Simonetta, and Alberti), and takes a view quite different from David Kahn’s (now 50-year-old) interpretation.

Please take a look: I don’t yet know where it will end up (i.e. as a book chapter, a journal article, or whatever), but I thought it would be good to push the current version up, see what people think.

The abstract runs as follows:

Fifteenth Century Cryptography Revisited

In the fifteenth century, the art of secret writing was dramatically transformed. The simple ciphers typical of the preceding century were rapidly replaced by complicated cipher systems built from nulls, nomenclators, homophones and many other tricks.

Homophones – where individual plaintext letters were enciphered by one of a set of different shapes – were, according to David Kahn’s influential interpretation, added specifically to defend against frequency analysis attacks. Kahn interprets this as a sign of the emergence of cryptanalysis, possibly from Arab sources, and also of the growing mathematization and professionalism of cryptology.

However, by closely examining key ciphers and cipher-related texts of this period, this paper instead argues that homophones were instead added as a steganographic defence. That is, the intention was specifically to disguise linguistic weaknesses in Italian and Latin plaintexts that rendered ciphertexts vulnerable to easy decryption.

Building on this analysis, a new account of the history of fifteenth century cryptography is proposed, along with a revised model charting the flow of ideas influencing cryptographic practice during this fascinating period.

Though it runs to eighteen pages, it should be easy to pick up and read. Please let me know if there’s anything that you think needs clarification, or which you think is incorrect etc.

Between 22nd March 2005 and 6th August 2006, someone calling himself/herself “IKLP” (supposedly an acronym for “I Killed Laci Peterson”) posted a large number of comments to the (now-defunct) fratpack.com Internet forum. These comments were mocking, often rhymed (badly), and referred more than a few times to the Zodiac Killer, e.g.

Green River was a bore. Zodiac but a little whore. I am the one to adore. I be the one you should never ignore.

*sigh* So far so nothing. Yet two of these comments appeared to contain codes:

* The IKLP Short Code (10th September 2005)

28527-8240-791-94-7

* The IKLP Long Code (30th October 2005)

Fore if you break the code. Then it is you who will know.

2334-342-23-4-5456-824-00-6-19054334-06-3-454-334445-9943-
99834511-94345=9953=986-555-666-9495-945422-07862-
993233-=348842-865-999-=666-922166-49-45495-0096-
3459-=99643+852343-9945-09923+=499388*4939/0045-29454-2-37
09-003400-9345-+1195=44521-9835=99521=99544-594399094-
99543295+99659=992344-9399339-672395-99334=9604=168-
237=593-9634-678-1607-23456-4345=2005

Fore now we will see. If you are as smart as me.

Farmer’s “solution”

In 2008, Christopher Farmer (he of the now-defunct OPORD Analytical forum) posted up what he claimed were the solutions to both of these. In short, Farmer concluded that the IKLP Long Code referred to the solar clock in Cesar Chavez Park (specifically the word “DETERMINATION”), while the IKLP Short Code referred to a specific address:

City Finance and Customer Service
1010 Tenth Street, Third Floor, Suite 2100
Modesto, California, 95354

Unfortunately, Farmer’s Byzantine proofs and long-winded arguments were, as solutions go, no less voluminous than vacuous: for precision, they were right up there with picking random words from the OED or sticking pins into a Borgesian map. Truly, truly horrible.

But the right question to be asking is something far simpler: are these even real codes?

Code or fauxed?

The long string of (basically) wacko-style comments surrounding the codes would give many onlookers good reason to think they came from a person who was somewhat unhinged. But to walk away purely for that reason would be intellectually chicken: we should have the confidence in our cryptanalysis and observation skills to have a look regardless, right? So let’s try…

The short code doesn’t seem to offer much to bite on: it’s just too short. However, I did wonder whether the long code might be (if you remove all the non-digits) a two-digit homophonic cipher:

23 34 34 22 34 54 56 82 40 06 19 05 43 34 06 34 54 33 44 45 99 43

99 83 45 11 94 34 59 95 39 86 55 56 66 94 95 94 54 22 07 86 2

99 32 33 34 88 42 86 59 99 66 69 22 16 64 94 54 95 00 96

34 59 99 64 38 52 34 39 94 50 99 23 49 93 88 49 39 00 45 29 45 42 37 09 00 34 00 93 45 11 95 44 52 19 83 59 95 21 99 54 45 94 39 90 94

99 54 32 95 99 65 99 92 34 49 39 93 39 67 23 95 99 33 49 60 41 68

23 75 93 96 34 67 81 60 72 34 56 43 45 20 05

This has a fairly strong distribution, with 34 and 99 coming in at 9.1% and 7.6% of the total letters respectively (remember that E = ~12.49% and T = ~9.28% in English):

[13] – 34
[11] – 99
[ 7] – 45, 94, 95
[ 6] – 39, 54
[ 4] – 00, 23, 49, 59, 93
[ 3] – 22, 33, 43, 56, 86
[ 2] – 05, 06, 11, 19, 32, 42, 44, 52, 60, 64, 66, 67, 83, 88, 96
[ 1] – 07, 09, 16, 20, 21, 29, 37, 38, 40, 41, 50, 55, 65, 68, 69, 72, 75, 81, 82, 90, 92

All of which might (weakly) argue not for an out-and-out homophonic cipher, but rather for a nomenclatura-type cipher, where some number pairs stand in for common words or (more rarely) syllables; or alternatively a simple cipher that was augmented by adding a load of nulls.

And yet at the same time, it feels to me as though this has only managed to cut close to the core of what’s going on here, but not right to its middle. But even so, it might (possibly) be a start.

What do you think?

I posted up seven homophonic challenge ciphers a few days ago, and now – though it may sound a little counter-intuitive – I’d like to try to help you solve them (bear in mind I don’t know if they can be solved, but the whole point of the challenge is to find out).

Of the seven ciphers, #1 is the longest (and hence probably the easiest). Reformatted for ten columns rather than five (it uses five cycling alphabets ABCDE, ie. “ABCDE ABCDE” over ten columns):

121,213,310,406,516, 108,200,323,416,513,
112,208,308,409,515, 102,216,309,425,509,
114,215,309,417,507, 102,201,323,401,517,
111,200,306,408,500, 113,203,313,407,512,
103,223,313,403,511, 119,213,316,416,511,
102,204,324,418,517, 120,203,324,407,516,
105,209,312,401,504, 117,208,310,408,500,
113,203,301,425,513, 115,201,313,408,515,
115,214,308,406,501, 122,204,322,408,509,
114,209,305,412,504, 117,213,316,402,509,
100,200,310,423,513, 100,214,320,419,509,
114,209,309,419,520, 101,200,320,416,518,
120,211,313,403,509, 103,207,313,421,513,
107,209,305,407,523, 115,224,313,416,508,
102,203,306,416,514, 107,200,310,401,509,
103,212,324,

Repeated Quadgram

Commenter Jarlve (whose interesting work on the Zodiac Killer ciphers some here may already know) noted that there is a repeated quadgram here, i.e. the sequence 408 500 113 203 appears twice.

This is entirely true, and also a very sensible starting point: I’ve highlighted this quadgram in the following diagram, along with all other repeated A-alphabet tokens (i.e. 100..125), and also any tokens they touch more than once (i.e. in the B and E alphabets):

Another thing that’s interesting here is that the 102 token (that appears four times and is coloured purple in the above) appears with four different letters before it as well as four different letters after it. In classical cryptology, that’s normally taken as a strong indicator that this is a vowel: and with the high instance count (4 out of 31, i.e. 12.9%), you might reasonably predict that this is E, A, O, or perhaps I (in order of decreasing likelihood).

[Note that I haven’t looked to check what letter this actually is: having created the challenge ciphers, I’ve just left them to one side, and don’t intend to look again at them.]

Similarly, the 114 token (that appears three times and is coloured green) is always preceded by 509, and is followed by 209 on two of the three instances. (Note that the token two after it is 309 in two of the three instances as well.) Again, in classical cryptology, these kind of structured contacts are normally taken as strong indicators that this token enciphers a consonant: and with the high instance count (3 out of 31, i.e. 9.7%), you might reasonably predict that this enciphers T or possibly N, S, or H.

With these two examples in mind, it strikes me that for any given plaintext language (English in the case of these challenge ciphers) you could easily build up probability tables for repetitions of the two tokens before and the two tokens after any given token: and then use those as a basis to predict (for a given ciphertext length) which plaintext letter they imply the letter is likely to be.

Though this may not sound like very much, because you can do this for all five of the alphabets independently, the results kind of rake across the ciphertext, yielding a grid of probabilistic clues that some clever person might well use as a basis for working towards the plaintext in ways that wouldn’t possible with randomly-chosen homophonic ciphers. Just sayin’. 😉

And The Point Is…

It’s entirely true that for homophonic ciphers where each individual cipher is chosen at random, the difficulty of solving a reasonably short cipher with five homophones per letter would be very high. But knowing (as here) that each column is strictly limited to a given sub-alphabet, my point is that many of the tips and tricks of classical cryptology are also available to us, albeit in slightly different forms from normal.

Yet while it’s encouraging for solvers that there is a repeated quadgram here, I don’t currently believe that cipher #1 will be (quite) solvable with pencil and paper, as if it were a Sudoku extra-extra-hard puzzle (though as always, I’d be more than delighted to be proved wrong).

However, my hunch remains that strictly cycling homophonic ciphers may well prove to be surprisingly solvable using deviousness and computer assistance, and I look forward very much to seeing how they fare. 🙂

While thinking about the Scorpion S1 unsolved cipher in the last few days, it struck me that it seemed to be a special kind of homophonic cipher, one where the homophones are used in rigid groups.

That is: whereas the Zodiac Killer’s Z408 cipher cycled (mostly but not always) between sets of homophones by their appearance, it appears that the Scorpion S5 cipher maker instead rigidly cycled between 16 sets of homophones by column. What’s interesting about both cases is that the use pattern gives solvers extra information beyond that which they would have for a homophonic cipher where each homophone instance was chosen completely at random.

Perhaps there’s already a special name for this: but (for now) what I’m calling them is “constrained homophonic ciphers“, insofar as they are homophonic ciphers but where an additional use pattern constrains the specific way that the homophones are chosen.

The question I immediately wanted to know the answer to was this: can we solve these? And what better way to find this out than by issuing a challenge!

Seven Challenge Ciphers

The seven challenge ciphers are downloadable as a single zip file here, or as seven individual CSV files here:
* #1
* #2
* #3
* #4
* #5
* #6
* #7

How The Ciphers Were Made

Unlike normal challenge ciphers, what I’m giving you here (in line with Kerkhoffs’ Principle) is complete disclosure of the cipher system and even the plaintext language.

The cipher system used here is a homophonic cipher with exactly five possible homophones for each plaintext letter BUT where the homophones are strictly selected according to the column number in which they appear in the ciphertext. Each separate CSV uses its own individual key.

The plaintext language is English: they are straightforward sentences taken from a variety of books, and without any sadistic linguistic tricks (i.e. no “SEPIA AARDVARK” or similar to confuse the issue).

The enciphered files are simple CSV (comma-separated values) text files, arranged in rows of five letters at a time, but encoded as decimal numbers. For example, the first (and the longest) challenge cipher (“test1.csv”) begins as follows:

121,213,310,406,516,
108,200,323,416,513,
112,208,308,409,515,

Here, “121,213,310,406,516,” enciphers plaintext letters #1..#5, “108,200,323,416,513,” enciphers plaintext letters #6..#10, and so forth. The first column is numbered in the range 100..125 (i.e. these belong to the 1st homophonic alphabet), the second column 200..225 (i.e. these belong to the 2nd homophonic alphabet), and so forth.

The start of the message and the end of the message are exactly as you would expect: there is no padding at either end, no embedded key information, just pure ciphertext.

The Rules

Treating this as a massively parallel book search using cloud databases (a) will be treated as cheating, and (b) will spoil it for other people, so please don’t do that. This challenge is purely about finding the limits of cryptanalysis, not about grandstanding with Big Data.

Hence you’ll need to also tell me (broadly) what you did in order to rise to the challenge, so that I can be sure you haven’t solved it through secondary or underhand means.

The Prize

If nobody solves any of the challenge ciphers by the end of 2017, my wallet stays shut.

However, the person (or indeed group) who has the most success decrypting any of these seven challenge ciphers by 31st December 2017 will be the “2017 Cipher Mysteries Cipher Champion“, and will also receive a shockingly generous £10 prize (sent anywhere in the world where PayPal can send money) to spend as they wish.

In the case of multiple entrants solving the same difficulty cipher independently, I’ll award the prize to the first to contact me. In all cases, please leave a comment below.

In all situations, my decision is final, absolute, arbitrary and there is no opportunity for appeal. Just so you know.

PS: any individual (or indeed covert agency) wishing to donate more money to increase the prize fund (i.e. to make a little more cryptanalytic sport of this), please feel free to email me.

Hints and Tips

I suspect that the multiplicity (i.e. the number of different symbols used divided by the length of the ciphertext) will prove to be too high and the ciphertext lengths too short for conventional homophonic decryption programmes, so I expect prospective solvers won’t be able to look to these for any great help.

Similarly, I don’t believe that numerical brute force and/or parallel processing will be sufficient here: all the same, these challenges (if solvable) will probably prove to be things that anyone anywhere can tackle (e.g. through hill-climbing and cleverly exploiting the constraints), not just the NSA, GCHQ or similar with their supercomputers.

For what it’s worth, my best guess right now is that #1 (the longest of the seven ciphertexts) will prove to be solvable… though only just. Even so, I’d be delighted to be proved wrong for any of the others.

Incidentally, I chose the length of the very shortest challenge cipher to broadly match the length of the Scorpion S1 cipher: so even in the (perhaps unlikely) case where all seven of my challenge ciphers get solved, there’ll still be an eighth challenge to direct your clever efforts at. 😉

I’ve blogged a few times about trying to crack the Scorpion Ciphers (a series of apparently homophonic ciphers sent to American crime TV host John Walsh). Most of my effort has been spent on the Scorpion S5 cipher, which (despite having 12 columns) appears to be rigidly cycling between 16 cipher alphabets.

However, it struck me a few days ago that this might also give us a way in to the Scorpion S1 cipher. This is because all the repeats there seem to be at a column distance of 0, 1, 4, 5, and 6, with the overwhelming majority of repeats at column distances 0 and 5. (The only exception is the “backwards L” glyph, which appears in two pairs, one pair at column distance 0 apart other, and the other at column distance 5 apart)

The Slippy S1 Five-Alphabet Hypothesis

Putting the 16-alphabet-cycle from S5 together with the mostly-0-or-5-column-distance observation from S1 yields my “Slippy S1 Five-Alphabet Hypothesis”: that Scorpion S1 was constructed from a cycle of 5 cipher alphabets, where the encipherer always reset to alphabet #1 at the beginning of a line, and usually (but not always) stepped to the next alphabet along with each new column.

So whereas a rigid 5-alphabet cycle (i.e. with no slips) would have a fixed alphabet “ownership” of 1234512345 for each ten-glyph line, I suspect that we can make a “slippy” guess for S1’s cycle ownership, to try to reconstruct where the encipherer slipped from one cycle into the next. My best current set of guesses for S1 is therefore:

1234512235
1234512344
1234412345
1234512345
1234112345
1234551245
2234512345

(Note that I suspect that the “backwards L” shape appears on two alphabets, i.e. once in alphabet #2 and once in alphabet #4, but that this is the only exception to the rule.)

What this means is that each of the five alphabets has only 26 glyphs in them (one for each letter of the alphabet): and so we can tell that if two shapes are numbered as being in the same alphabet, they are very probably two different letters.

Can We Solve This?

53 of S1’s 10 x 7 = 70 glyphs are unique, yielding a high multiplicity of 75.7%. By way of comparison, it would seem that normal (unstructured) homophonic ciphers are only solvable when their multiplicity is around the 20%-25% mark.

However, the question here is whether being able to group the letters into five unique alphabets (even probabilistically) reduces the number of combinations enough to make this genuinely solvable. As normal, pencil-and-paper solvers can make some pretty good guesses, e.g. the “S Λ” pair on lines #3 and #6 probably enciphers “TH”, while any repeated letter stands a good chance of being a normal high-frequency letter such as ETAOINS etc: but computers would do this much better.

My instinct is that this should be a good candidate for hill-climbing: and that the one-glyph-per-letter-per-alphabet constraint will prove reasonably effective. But effective enough? We’ll have to wait and see…

Incidentally, a good sanity check for this Scorpion S1 hypothesis would be to run some “forward simulations” (which is the kind of thing Dave Oranchak has done so much of with the Zodiac Killer Ciphers). By which I mean: if we feed a variety of 70-letter English texts into my best guess set of slippy cycles (i.e. “ITWASTHEBESTOFTIMESI” fed into 1234512235 / 1234512344 would become: “I1 T2 W3 A4 S5 T1 H2 E2 B3 E5 S1 T2 O3 F4 T5 I1 M2 E3 S4 I4”), I predict that the final average multiplicity of the texts will be close to 75%. But I might be wrong!

Apart from the case of the Somerton Man, has any other police investigation ever revolved around a book left in a complete stranger’s car? Personally, I’d be surprised: this seems to be a unique feature of the whole Somerton Man narrative.

But what, then, of the obvious alternate explanation, i.e. that the Rubaiyat was in the car already? For all the persuasive bulk the dominant explanation has gained from being parroted so heavily for nearly seven decades, I think it’s time to examine this (I think major) alternative and explore its logical consequences…

Gerry Feltus’s Account

To the best of my knowledge, Gerry Feltus is the only person who has actually talked with the (still anonymous) man who handed the Rubaiyat in. So let us first look at Feltus’ account (“The Unknown Man”, p.105) of what happened at the time of the Somerton Man’s first inquest when the police search for the Rubaiyat was mentioned in the press:

Francis [note: this was Feltus’ codename for the man] immediately recalled that his brother-in-law had left a copy of that book in the glove box of his little Hillman Minx [note: not the car’s actual make] which he normally parked in Jetty Road. He could not recall him collecting it, and so it was probably there. He went to the car and looked in the glove box – yes, the book was still there. To his amazement a section had been torn out of the rear page, in the position described by past newspaper reports.

“Ronald Francis” then telephoned his brother-in-law:

Do you recall late last year when we all went for a drive in my car, just after that man was found dead on the beach at Somerton? You were sitting in the back with your wife and we all got out of the car, the book you were reading, you put in the glove box of my car, and you left it there.

To which the brother-in-law replied:

No it wasn’t mine. When I got in the back seat, the book was on the floor; I fanned through some pages and thought it was yours, so when I got out of the car I put it in the glove box for you.

A while back, I pressed Gerry Feltus for more specific details on this: though he wouldn’t say what make of car the “Hillman Minx” actually was, he said that the man told him that the book turned up “a day or two after the body was found on the beach, and during daylight hours“. Gerry added that “Francis” was now very elderly and suffering from severe memory loss. Even so, he said that “I have spoken to Francis, his family and others and I am more than satisfied with what he has told me“.

Finally: when “Francis” handed the Rubaiyat to the police, he “requested that his identity not be disclosed”, for fear that he would be perpetually hounded by the curious. Even today (2017) it seems that only Gerry Feltus knows his identity for sure: though a list of possible names would include Dr Malcolm Glen Sarre and numerous others.

Newspaper Accounts

All the same, when I was trying to put everything into a timeline a while back, I couldn’t help but notice that Gerry’s account didn’t quite match the details that appeared in the newspapers at the time:

[1] 23rd July 1949, Adelaide News, page 1:

[…] an Adelaide businessman read of the search in “The News” and recalled that in November he had found a copy of the book which had been thrown on the back seat of his car while it was parked in Jetty road, Glenelg.

[2] 25th July 1949, Adelaide Advertiser, page 3:

A new lead to the identity of the Somerton body may have been discovered on Saturday when Det.Sgt. R. L. Leane received from a city business man a torn copy of Fitzgerald’s translation of the Rubaiyat of Omar Khayyam said to have been found in his car at Glenelg about last November, a week or two before the body was found.
  The last few lines of the poem, including the words “Tamam shud” (meaning “the end”) have been torn out of the book.
  When the body was searched some time ago a scrap of paper bearing the words “Tamam shud” was found in a pocket.
  Scrawled in pencilled block letters on the back of the cover of the book are groups of letters which appear to be foreign words and some numbers.
  These, it is hoped, may be of assistance in tracing the dead man’s identity.
  The business man told Det.Sgt. Leane that he found the copy of the Rubaiyat in the rear of his car while it was parked in Jetty road Glenelg, about the time of the RAAF air pageant in November.
  He said he had known nothing about the much-publicised words “Tamam shud” until he saw a reference to them on Friday.

[3] 26th July 1949, Adelaide News, page 1:

The book had been thrown into the back seat of a motor car in Jetty road, Glenelg, shortly before the victim’s body was found on the beach at Somerton on December 1.
[…]
Although the lettering was faint, police managed to read it by using ultra-violet light. In the belief that the lettering might be a code, a copy has been sent to decoding experts at Army Headquarters, Melbourne.

Why Do These Accounts Differ?

The Parafield air pageant mentioned unequivocally in the above newspaper accounts was held on 20th November 1948, ten days or so before the Somerton Man was found dead on Somerton Beach. Yet Gerry Feltus was told by “Ronald Francis” himself that the book turned up “a day or two after the body was found on the beach”. Clearly, these two accounts can’t both be right at the same time.

I of course asked Gerry directly about this last year: by way of reply, he said “Don’t believe everything you read in the media, eg; ‘The business man told Det. Leane…. etc…’.“. Moreover, he suggested that I was beginning “to sound like [Derek] Abbott”, who had “nominated the same things as you”.

This is, of course, polite Feltusese for “with respect, you’re talking out your arse, mate“: but at the same time, all he has to back up this aspect of his account – i.e. that the book turned up after the Somerton Man was found, not ten days before – is “Ronald Francis”‘s word, given half a century after the event.

Hence this is the point where I have to temporarily bid adieu to Gerry Feltus’s account, because something right at the core of it seems to be broken… and when you trace the non-fitting pieces, they all seem to me to lead back to the Rubaiyat and the car.

So… what really happened with the Rubaiyat and the car? Specifically, what would it mean if the Rubaiyat had been in the car all along?

The Rubaiyat Car Theory

If the Rubaiyat was already in the back of the “little Hillman Minx”, it would seem to be the case that:

(*) Ronald Francis had no idea what it was or why it was there
(*) Ronald Francis’ brother-in-law had no idea what it was or why it was there
(*) …and yet the Rubaiyat was connected to that car in some non-random way
(*) …or, rather, it was connected to someone who was connected to the car

Given that one of the phone numbers on its back was that of Prosper McTaggart Thomson – a person who lived a quarter of a mile away from where “Ronald Francis” lived or worked, and who (as the Daphne Page court case from five months earlier demonstrated beyond all doubt) helped people sell cars on the black market by providing fake “pegged-price” documentation – it would seem reasonable at this point to hypothesize that Prosper Thomson may well have been the person who had sold “Ronald Francis” that specific car.

There was also a very good reason why many people might well have been looking to sell their cars in November 1948: the Holden 48-215 – the first properly Australian car – was just then about to be launched. Note that the “little Hillman Minx” could not have been a Holden if it had been driven to the Parafield air pageant, as the very first Holden was not sold until the beginning of December 1948:

If “Ronald Francis” had just bought a car in (say) mid-November 1948, I can quite imagine him proudly taking his wife, his brother-in-law and his wife off to the Parafield air pageant for a nice day out.

If Prosper Thomson’s behaviour in the Daphne Page court case was anything to go by, I can also easily imagine that the person who had sold that car might have wondered if he was being swindled by the middle man. In his summing up, the judge said that “[t]he defendant [Thomson] had not paid the £400 balance, and had never intended to do so“: so who’s to say that Thomson was not above repeating that same trick, perhaps with someone from out of town?

Perhaps, then, the person whose Rubaiyat it was was not Prosper Thomson himself, but the person from whom Prosper Thomson had just bought the car in order to sell it to “Ronald Francis”.

Perhaps it was this person’s distrust of Thomson’s financial attitude had led him to hide the Rubaiyat under the back seat of the car, with the “Tamam Shud” specifically ripped out so that he could prove that it was he who had sold the car to Thomson in the first place.

And so perhaps it was the car’s previous owner who was the Somerton Man, visiting Glenelg to track down the owner of his newly sold car, simply to make sure he hadn’t been ripped off by Prosper Thomson.

The Awkward Silence

I’ve previously written about how social the Somerton Man seemed to have been, and how that jarred with the lack of helpful response the police received. For all its physical size, Australia still had a relatively small population back then.

So perhaps the silence surrounding the Somerton Man cold case will turn out to be nothing more than that of jittery people buying and selling cars not through dealers, people who the Price Commissioners pegged prices had effectively turned into white-collar criminals – for how many professionals were so well-off in post-war Australia that they could afford to be principled about losing £400 or more in the sale of their shiny American car?

Incidentally, it has been reported that on the back of the Rubaiyat were written two phone numbers: one of which was the (now-famous) phone number for the nurse Jo Thomson (which her soon-to-be-husband Prosper Thomson was also using for small ads in the newspapers), while the other was allegedly for a local bank.

These are the two things people selling black market cars need: the number of the middle man who was laundering the transaction, and the number of bank to make sure cheques clear (remember that a dud cheque to pay for a car was ultimately what triggered the Daphne Page court case).

But the other thing such people need is an absence: an absence of discussion about the transaction. And if “Ronald Francis” had only just bought his car on the black market through Prosper Thomson (thanks to Price Commission pegging, only about 10% of car sales back then went through official car dealer channels), he would surely have had a very specific reason not to want the details of his sale explored and made public.

And so I wonder whether this was the real reason why Ronald Francis didn’t want his name revealed: because if the police were to understand the web of dealings that had brought the Somerton Man to Glenelg, that would inevitably make it clear that the two men were the participants in a black market car sale, one which – though widely practised – was still a Price Commission offence with stiff penalties.

Along those same lines, I also wonder whether it was Ronald Francis himself who erased the pencil writing from the Rubaiyat’s back cover, to try to cover at least some of the tracks that might lead police in his direction. Of course, we now know that SAPOL’s photographers were able to use ultra-violet photography to (mostly) reconstruct the letters: but this may well not have been known to him at the time.

Please note that I’m not saying this is the only plausible explanation for everything. However, insofar as it tackles (and indeed resolves) a large number of the trickiest aspects of the case, it’s at least worth considering, right?

A Final Note

To be clear, when I ran this whole Rubaiyat Car suggestion past Gerry Feltus (admittedly in an earlier iteration), he dismissed it out of hand (though without any actual evidence to back up his position):

“I will not go into the possibility that the man purchased his car from Prosper. It is an absolutely rubbish suggestion that has no credibility. Poor old Prosper. He must have been the only ‘black market’ racketeer in Adelaide. From my knowledge of the climate during that relevant period he was a ‘nothing’.”

Well, Gerry was absolutely right insofar as that in 1948 Prosper was a small-time black marketeer, a mere minnow in the Melbourne-dominated black market car pool: but all the same, he was a minnow that lived extremely close by.

I suspect the real problem here is that if the mainstream story is wrong – that is, if Ronald Francis’ car had not long before (like so many others at the time) been bought at a premium on the black market, and if Francis had told white[-collar] lies to try to cover up his part in an illegal transaction once he realized what had happened – then people have been concealing their true involvement with what happened for nearly 70 years, not because of murder but because the price control legislation made criminals of nearly everyone selling their car.

And so it might well be that Gerry Feltus (and indeed just about everyone else) has been viewing the Somerton Man as entirely the wrong kind of mystery: not a police cold case, but a Price Commission cold case. How boringly middle class!

During and immediately after World War II, governments everywhere looked with dismay at their non-functioning factories, empty warehouses, and depleted male workforce. Even though the normal economic response to such shortages would be for prices to go up, it was politically vital under the circumstances to prevent profiteering, exploitation, and inflationary pressure from disrupting domestic marketplaces yet further.

In the Commonwealth, legislation was brought in during 1939 to control the prices of many key goods, commodities and supplies: this was known as the Commonwealth Prices Branch. In Australia, this was implemented by appointing a Deputy Price Commissioner for each state, who was tasked with assessing the correct level that specific prices should be. These commissioners were also given the power to investigate and enforce those “pegged” prices (quite independently of the police): the price controls continued until the 1950s.

(Archive material on price control in South Australia is indexed here. For what it’s worth, I’m most interested in D5480.)

Black Markets

While this legislation did (broadly) have the desired effect, the mismatches it introduced between the price and the value of things opened up numerous opportunities for short term black markets to form. One well-known black market was for cars:

15th June 1948, Barrier Miner, page 8:

HINT OF FALL IN USED CAR PRICES
Melbourne.- If control were lifted, prices of used cars would fall and the black market would disappear, men in the trade said today.
  Popular American cars would settle to slightly below the former black market price and expensive English cars to below the pegged price, they said.
  The pegged price for a 1938 Ford has been £235, and the black market price £450. Buicks, Oldsmobiles, Chevrolets, and Pontiacs might sell for 75 per cent more
than the pegged price.
  There was no shortage of English cars, so a 1937 Alvis, now £697, could go down to about £495. The classic English cars of the late 20’s and early 30’s, pegged at about £300, would probably sell at less than £100.
  Every car would then find its level. Drivers who had kept their cars in good condition would be able to sell them in direct relation to their values.
  Men in the trade said honest secondhand car dealers had almost been forced out of business during the war. Records showed that 90 per cent of all used car sales were on a friend-to-friend basis and they never passed through the trade.

But because you could be fined or go to prison if you bought or sold a car for significantly more than its pegged price, to sell your (say) fancy American car on the black market you would need two separate things: (1) a buyer willing to pay more than the pegged price, and also (2) someone who could supply nice clean paperwork to make the sale appear legitimate if the State Deputy Price Commissioner just happened to come knocking at your door.

And yet because back then cars were both aspirational and hugely expensive (in fact, they cost as much as a small house), so much money was at stake here that it was absolutely inevitable the black market in cars would not only exist, but, well, prosper.

So this is the point where Daphne Page and Prosper Thomson enter the room: specifically, Judge Haslam’s courtroom… I offer the remainder of the post without comment, simply because the judge was able to read the situation quite clearly, even if he didn’t much like what he saw:

Daphne Page vs Prosper Thomson

21st July 1948, Adelaide Advertiser, page 5:

Sequel To Alleged Loan. — Claiming £400, alleged to be the amount of a loan not repaid, Daphne Page, married of South terrace, Adelaide, sued Prosper McTaggart Thomson, hire car proprietor, of Moseley street, Glenelg.
  Plaintiff alleged that the sum had been lent to defendant on or about November 27 last year so that he could purchase a new car and then go to Melbourne to sell another car.
  Defendant appeared to answer the claim.
  In evidence, plaintiff said that before she lent defendant the money she asked for an assurance that she would get it back promptly. She had not obtained a receipt from defendant. After several attempts had been made later to have the loan repaid by Thomson, he had said that the man to whom he had sold the car in Melbourne had paid him by a cheque which had not been met by the bank concerned. When she had proposed taking action against defendant he had said that if she took out a summons she would be “a sorry woman”. He had threatened to report her for “blackmailing.”
  In reply to Mr. R. H. Ward, for defendant, the witness denied that anything had ever been said about £900 being paid for the car. She had never told Thomson that she wanted that sum for it. The pegged price of the car was £442.
  Part-heard and adjourned until today.
  Miss R. P. Mitchell for plaintiff.

22nd July 1948, Adelaide Advertiser, page 5:

BEFORE JUDGE HASLAM:—
Alleged Loan.— The hearing was further adjourned until today of a case in which Daphne Page, married, of South terrace, Adelaide, sued Prosper McTaggart Thomson, hire car proprietor, of Moseley street, Glenelg, for £400, alleged to have been a loan by her to him which be had not repaid.
  Page alleged that the loan had been made on or about November 27 last year so that he could purchase a new car and then go to Melbourne to sell another car.
Thomson said that in answer to an advertisement Page had approached him on October 39 with a car to sell. She wanted £900 for it. On November 11 she accepted £850 as the price for the car and said that the RAA had told her that the pegged price was £442.
  He drew a cheque for £450 and gave it to Page, who told him she had made out a receipt for £442, the pegged price. Early in December he went to Melbourne to sell a car for another man. On his return to Adelaide be found many messages from Page requesting that he would telephone her. He did not do so, but about a week later met her and told her that he could not pay her the £400 “black market balance” on the car because he had had a cheque returned from a bank.
  Page had said she wanted the money urgently, as she had bought a business. Witness “put her off.”
  Later, just before a summons was delivered to him, Page had telephoned and asked when he intended to pay the £400. She had spoken affably, but when he told her that he had had advice that he was not required to pay more than the pegged price of the car and did not intend to do so, she had said she would summons him and “make out that the money was a loan.” She had said that she would bring forward “all her family as witnesses.” He hung up the telephone receiver. He had never borrowed money from Page.
  Thomson was cross-examined at length by Miss R. F. Mitchell, for Page. Mr. R. H. Ward for Thomson.

23rd July 1948, Adelaide Advertiser, page 5:

BEFORE JUDGE HASLAM:
Claim Over Car Transaction.
  Judgment was reserved yesterday in a case in which Daphne Page, married, of South terrace, Adelaide, sued Prosper McTaggart Thomson, hire car proprietor, of Moseley street, Glenelg, for £400, alleged to have been a loan by her to him which he had not repaid.
  It was alleged by Mrs. Page that the loan had been made on or about November 27 last year so that Thomson could purchase a new car and then go to Melbourne to sell another car.
  Thomson denied that he had ever borrowed money from Mrs. Page. He alleged that she had asked £900 for a car, the pegged price of which was £442, and had later agreed to accept £850 for it. After the transaction he had given her a cheque for £450 on account. Mrs. Page had made out a receipt for £442. When she had pressed him, later, for the remaining £400 of the sale, he had told her that, acting upon advice, he did not intend to pay her more than he had. She had then told him that she would summons him and make out that the money at issue was a loan.
  Mr. [sic] R. P. Mitchell for plaintiff: Mr. R. H. Ward for defendant.

7th August 1948, Adelaide Advertiser, page 8:

NEW Olds sedan taxi, radio equipped, available weddings, country trips, race meetings, &c.; careful ex-A.I.F. driver. lowest rates. Phone X3239.

17th August 1948, Adelaide News, page 4:

WON CASE BUT NO COSTS ALLOWED

  While he gave judgment for defendant in a £400 loan claim in Adelaide Local Court today in a case in which black-marketing of a motor car was mentioned, Judge Haslam refused costs because of defendant’s conduct in the transaction.
  Mrs. Daphne Page, of South terrace, City, sued Prosper McTaggart Thomson, hire car proprietor, of Moseley street, Glenelg, for £400 alleged to be the amount of a loan not repaid.
  His Honor said if it were not that the Crown would be faced with evidence of plaintiff in the case, he would send the papers to the Attorney-General’s Department with a suggestion that action be taken against defendant for the part he claimed to have taken in an illegal transaction.

“Direct conflict”

  His Honor said there was a direct conflict between an account which alleged a simple contract loan of £400, made without security and not in writing, and one which set up that the £400 represented the unpaid balance of a black-market transaction.
  Evidence was that in November last Mrs. Page had agreed to sell a Packard car for £442, but accepted a cheque for £450, defendant explaining the extra would cover the wireless in the car. Plaintiff gave a receipt for £442, the pegged price.
  Plaintiff claimed that in November she lent £400 cash to defendant with which to buy another car in Melbourne. Defendant’s account was that Mrs. Page said her lowest price for her car was £900 and that she afterwards accepted his offer of £850. He said he would give her £450 next day and would want a receipt for the fixed price of £442.
When he gave her the cheque, plaintiff said she did not want a cheque for £450 when the pegged price was £442. He told her not to worry as the unexpired registration and insurance would cover the £8 difference.

Borrowing denied

  Defendant said in evidence he did not pay the £400 balance and never intended to. He was advised of a new car being ready for delivery in November, but denied having borrowed £400 or any amount from Mrs. Page.
  His Honor said there was little support for Mrs. Page’s account as to the terms on which her car was sold. He was of opinion plaintiff had not shown on the balance
of probabilities that any amount was lent to defendant.
  Miss R. F. Mitchell appeared for plaintiff, and Mr. R. H. Ward for defendant.

18th August 1948, Adelaide Advertiser, page 5:

Black Market Sale Alleged
BEFORE JUDGE HASLAM:
  In a case arising from the sale of a motor car, in which his Honor yesterday gave Judgment for the purchaser, he refused him costs because of his conduct in the transaction.
  The evidence, he said, had produced a direct conflict between an account alleging that a simple contract loan of £400 had been made without writing or security, and one which set up that the money represented the unpaid balance of a black market deal.
  The plaintiff, Daphne Page, married woman, of South terrace, Adelaide, claimed £400 from Prosper McTaggart Thomson, hire car proprietor, of Moseley street, Glenelg, alleging the sum to be the amount of a loan not repaid.
  It was alleged by the plaintiff that the money had been lent to the defendant on or about November 27 last year, so that he could purchase a new car, and then go to Melbourne to sell another car.
  His Honor said he was of opinion that the plaintiff had not shown upon the balance of probabilities that any sum had been lent to the defendant. Were it not for the fact that the Crown would necessarily be faced with the evidence given by plaintiff in the case, he would send the papers relating to the proceedings on to the Attorney-General’s Department, with a suggestion that action sbould be taken against the defendant for the part he had claimed to have taken in an illegal transaction.
  There was little to support the plaintiff’s account regarding the terms upon which the car had been sold by her to the defendant, his Honor said. According to her, the price had not been specifically agreed upon, but left to be ascertained by reference to the pegged price, which was £442.
  The defendant’s account, his Honor continued, was tbat the plaintiff, after having first told him that £900 was the lowest price she would take for the car, had later accepted his offer of £850 for it. He had paid her £450 by cheque, telling her that he would have to borrow the remaining £400 from a finance company, and adding that he would want a receipt for the pegged price, and the registration to transfer the car into his name. The plaintiff had given him a receipt for £442. The defendant had not paid the £400 balance, and had never intended to do so.
  Miss R. F. Mitchell for plaintiff: Mr. R. H. Ward for defendant.

It would be fair to say that the title of George Edmunds’ hefty book “Anson’s Gold and the Secret to Captain Kidd’s Charts” somewhat undersells its scope. Edmunds claims – as does his former research partner Ron Justron’s ‘Great Lost Treasure’, perhaps unsurprisingly – to have solved just about every treasure-related story going, including Ubilla’s treasure, Kidd’s (supposed) maps, The Loot of Lima, The Bosun Bird Treasure, Oak Island, Rennes-le-Chateau, Shugborough Hall, etc etc.

Even though Edmunds pulls his horse up in front of Becher’s Brook (i.e. Justron’s final assertions regarding Tintin and the Secret of the Unicorn, *sigh*), the two theorists’ oeuvres are otherwise difficult to slide a fag paper between, no matter how hard you sand it down. Perhaps experts at telling the People’s Front of Judea apart from the Judean People’s Front would find this easy: I struggled in many places.

Putting the issue of Ron Justron to one side, what is Edmunds’ actual argument that manages to take up his book’s whopping 585 pages?

Part 1 – Identifying Killorain

Edmunds starts by taking the Ubilla treasure story completely at face value: he then trawls through a large number of similar-sounding buried treasure stories, before identifying (or, rather, offering an identification for) the character Killorain.

To do this, he uses what he calls “Story DNA”, i.e. by tracing the fragments of narrative shared, copied and re-used between different buried-treasure stories, Edmunds tries to deduce the relationships between those stories, and to reach out towards the Ur-story buried beneath.

Even though there’s a half-germ of a research idea in what he’s attempting here, at no point in his (actually quite large) book does this ever translate into a research methodology (or even an approach to complex reasoning) that anyone could follow, reproduce or use, on this subject or on anything else.

For sure, Part 1 is the clearest of all his sections: but at the end of it all, it’s still clear as mud to me why Edmunds thinks there can only be a single way of interpreting all the slabs of text he has copied over from numerous different sources to yield his particular conclusion. Yes, I can see how Killorain might be the person Edmunds thinks he is: but it’s a weak, sprawling, unfocused argument that carries him there, and it’s just not written in a way that acknowledges other possibilities or helps readers to eliminate those other possibilities.

Edmunds writes with enthusiasm (and not a little bombast at times): but it would need a significantly sharper knife than his “Story DNA” to pierce these historical veils. Has he managed to identify the treasure Ur-story’s paternity here? No, not really, sorry. 🙁

Part 2 – Identifying the Band of Pirates

Here Edmunds again tries to use Story DNA to strip down the ‘Bosun Bird’, the Loot of Lima, Cocos Island Treasure, Mururoa Atoll Treasure (the same one that excited Ron Justron so much), and the Palmyra Island Treasure stories into their overlapping DNA fragments to identify the band of pirates behind the single (supposed) pirate treasure event from which all these stories were derived.

However, his argument here is terrifically speculative (and noticeably fuzzier and weaker than Part 1’s): and right at the end, Edmunds expands his scope yet further – he now also wants his argument to encompass “Masonic DNA”. By this he means things which sound as though they link to Masonic practices or Masonic history, if you (again) strip them down to their fragmentary parts.

Unfortunately, this latter half makes his argument sound exactly like the kind of paranoid Masonic delusions that have plagued just about every piece of writing on treasure maps for the last century. To the best of my knowledge, there is no historical evidence whatsoever that links Speculative Freemasonry to anything remotely like a genuine conspiracy involving treasure: everything written on the subject has been little more than a giant house of cards (sans Frank Underwood, of course) that a single committed sneeze would blow to the floor.

Hence this for me is where Edmunds’ book “jumps the shark”, i.e. the point where the reader’s sympathies towards the kind of thing Edmunds was attempting (however imperfectly) in Part 1 quickly drop to zero. “Story DNA” was already only as strong as the execution (and this itself was noticeably lacking): but his “Masonic DNA” is just wrong-headed, and on many different levels.

Part 3 – H. T. Wilkins Joins The Party

Here, Edmunds recaps some of his previous book on Captain Kidd’s treasure maps (“Kidd: The Search For His Treasure”), but links his conclusions with Juan Fernandez Island, Oak Island, Plum Island, and a convoluted account of how he believes prolific author Wilkins was the mastermind behind it all.

Errmmm… really? Really truly honestly? Wilkins-as-Svengali is the conceit that enables both Edmunds and Ron Justron to make anything they want to be true sound true (i.e. where Wilkins can only have genuinely copied document X from an original source) or anything that doesn’t fit their chosen narrative sound false (i.e. Wilkins must have cleverly concocted document Y to leave a trail of clues that only the Wisest of the Wise can recognize and see past).

This is, of course, hyper-selective wishful thinking (as opposed to anything that might approach critical evaluation, or indeed critical thought). What makes this even clearer is Geoff Bath’s very interesting series of books, for which Geoff managed to uncover a whole lot of Wilkins’ correspondence. In my opinion, Bath offered up a picture of Wilkins that was radically different from (and, I believe, a lot more accurate and evidentially-grounded) than the one in either of Edmunds’ books.

Yes, Wilkins surely did personally create many of the maps that appear in his books, complete with Alle Ye Olde-Fashionned Nonne-Sense Texte He Couldde Comme Uppen Wyth: but it beggars belief that Wilkins was such a genius that he caused everything to fall into place for Edmunds, by leaving a faint trail of breadcrumb clues to The Real Treasure that only someone who just happened to cross-reference all his different books might possibly notice.

Part 4 – Latcham and Guayacan

This is where Edmunds looks (somewhat cursorily, it has to be said) at the Guayacan treasure story written about by Richard Latcham (and yes, I do have a copy of the original book in Spanish).

I’m sorry, though: as a piece of supposed history, this story really sucks. And the extra letter (supposedly by Captain Cornelius Patrick Webb of the Unicorn) is enough to put anyone right off their soup.

To start to explain away the problems with this, Edmunds (or rather Ron Justron’s Latin teacher acquaintance) translated the Cornelius Webb letter back into Latin (from Wilkins’ supposed mistranslation) and then back into English: and then talks about star codes, alchemy, celestial navigation, and yet more Masonic DNA. All of which is then brought together in the kind of numeric over-wrangling typically employed by conspiracy nutters to prove whatever thing they wanted to prove in the first place. Not that I’m saying that Edmunds is one of those: but the problem here is that his argument doesn’t make it easy to tell the two apart.

Perhaps others will find themselves convinced by this, but it left me as stone cold as Stone Cold Steve Austin. In Antarctica. Eating cold soup.

Part 5 – Rennes-le-Chateau

In which Edmunds recaps Pierre Plantard’s Rennes-le-Chateau story: he concludes that it is nonsense, but based on a genuine document connected to Lord Anson. Which is like asking the reader to disconnect their brain into neutral before turning the page. *sigh*

Part 6 – Anson’s Monument

By this point I was finding it extremely difficult to find the will to turn the pages. Good luck if you want to try summarizing this.

Part 7 – Mathematical-Sounding Stuff

This part covers the Golden Ratio, spirals, hidden geometry, and all the other gee-whizz crop circle stuff they don’t teach you on a Maths degree. If it had any redeeming features, I didn’t manage to pick up on them: by now, the nausea was really quite overwhelming.

Incidentally, a short section on Spanish Treasure Codes reproduces some drawings from a 65-page 2004 book called “The Spanish Code to Treasure” by Lou Layton (now deceased): however, it’s extraordinarily hard to tell whether these are genuine or just wishful thinking.

Part 8 – Was This A Templar Treasure?

Errmmm… no, it wasn’t. Next!

Part 9 – “Well, That About Wraps It Up For God”

Fans of Hitchhiker’s Guide To The Galaxy will probably recognize the above as the title of one of Oolon Colluphid’s books. These were all characterized by foolish self-referential logic that purported to use the existence of God to prove His non-existence, e.g.:

“I refuse to prove that I exist,” says God, “for proof denies faith, and without faith I am nothing.”

“But,” says Man, “the Babel fish is a dead giveaway, isn’t it? It could not have evolved by chance. It proves that You exist, and so therefore, by Your own arguments, You don’t. QED”

Suffice it to say that, to my mind, this final part of Edmunds’ book – that applies Story DNA, Masonic DNA, star codes, numerology and abstruse numerical calculations to the Shugborough Hall Shepherds’ Monument to supposedly yield the precise longitude and latitude of a buried pirate treasure – reminds me strongly of Oolon Colluphid. And not in a flattering way.

But feel free to read “Anson’s Gold” for yourself and make up your own mind: for what do I know?

As I wrote before, I think we have four foundational challenges to tackle before we can get ourselves into a position where we can understand Voynichese properly, regardless of what Voynichese actually is:

* Task #1: Transcribing Voynichese into a reliable raw transcription e.g. EVA qokeedy
* Task #2: Parsing the raw transcription to determine the fundamental units (its tokens) e.g. [qo][k][ee][dy]
* Task #3: Clustering the pages / folios into groups that behave differently e.g. Currier A vs Currier B
* Task #4: Normalizing the clusters i.e. understanding how to map text in one cluster onto text in another cluster

This post relates to Task #2, parsing Voynichese.

Parsing Voynichese

Many recent Voynichese researchers seem to have forgotten (or, rather, perhaps never even knew) that the point of the EVA transcription alphabet wasn’t to define the actual / only / perfect alphabet for Voynichese. Rather, it was designed to break the deadlock that had occurred: circa 1995, just about every Voynich researcher had a different idea about how Voynichese should be parsed.

Twenty years on, and we still haven’t got any consensus (let alone proof) about even a single one of the many parsing issues:
* Is EVA qo two characters or one?
* Is EVA ee two characters or one?
* Is EVA ii two characters or one?
* Is EVA iin three characters or two or one?
* Is EVA aiin four characters or three or two or one?
…and so forth.

And so the big point of EVA was to try to provide a parse-neutral stroke transcription that everyone could work on and agree on even if they happened to disagree about just everything else. (Which, as it happens, they tend to do.)

The Wrong Kind Of Success

What happened next was that as far as meeting the challenge of getting people to talk a common ‘research language’ together, EVA succeeded wildly. It even became the de facto standard when writing up papers on the subject: few technical Voynich Manuscript articles have been published since that don’t mention (for example) “daiin daiin” or “qotedy qotedy”.

However, the long-hoped-for debate about trying to settle the numerous parsing-related questions simply never happened, leaving Voynichese even more talked about than before but just as unresolved as ever. And so I think it is fair to say that EVA achieved quite the wrong kind of success.

By which I mean: the right kind of success would be where we could say anything definitive (however small) about the way that Voynichese works. And just about the smallest proof would be something tangible about what groups of letters constitute a functional token.

For example, it would be easy to assert that EVA ‘qo’ acts as a functional token, and that all the instances of (for example) ‘qa’ are very likely copying mistakes or transcription mistakes. (Admittedly, a good few o/a instances are ambiguous to the point that you just can’t reasonably decide based on the scans we have). To my eyes, this qo-is-a-token proposition seems extremely likely. But nobody has ever proved it: in fact, it almost seems that nobody has got round to trying to prove anything that ‘simple’ (or, rather, ‘simple-sounding’).

Proof And Puddings

What almost nobody seems to want to say is that it is extremely difficult to construct a really sound statistical argument for even something as basic as this. The old saying goes that “the proof of the pudding is in the eating” (though the word ‘proof’ here is actually a linguistic fossil, meaning ‘test’): but in statistics, the normal case is that most attempts at proof quickly make a right pudding out of it.

As a reasonably-sized community of often-vocal researchers, it is surely a sad admission that we haven’t yet put together a proper statistical testing framework for questions about parsing. Perhaps what we all need to do with Voynichese is to construct a template for statistical tests for testing basic – and when I say ‘basic’ I really do mean unbelievably basic – propositions. What would this look like?

For example: for the qo-is-a-token proposition, the null hypothesis could be that q and o are weakly dependent (and hence the differences are deliberate and not due to copying errors), while the alternative hypothesis could be that q and o are strongly dependent (and hence the differences are instead due to copying errors): but what is the p-value in this case? Incidentally:

* For A pages, the counts are: (qo 1063) (qk 14) (qe 7) (q 5) (qch 1) (qp 1) (qckh 1), i.e. 29/1092 = 2.66% non-qo cases.
* For B pages, the counts are: (qo 4049) (qe 55) (qckh 8) (qcth 8) (q 8) (qa 6) (qch 3) (qk 3) (qt 2) (qcph 2) (ql 1) (qp 1) (qf 1), i.e. 98/4147 = 2.36% non-qo cases.

But in order to calculate the p-value here, we would need to be able to estimate the Voynich Manuscript’s copying error rate…

Voynichese Copying Error Rate

In the past, I’ve estimated Voynichese error rates (whether in the original copying or in the transcription to EVA) at between 1% and 2% (i.e. a mistake every 50-100 glyphs). This was based on a number of different metrics, such as the qo-to-q[^o] ratio, the ain-to-oin ratio, the aiin-to-oiin ratio, the air-to-oir ratio, e.g.:

A pages:
* (aiin 1238) (oiin 110) i.e. 8.2% (I suspect that Takeshi Takahashi may have systematically over-reported these, but that’s a matter for another blog post).
* (ain 241) (oin 5) i.e. 2.0% error rate if o is incorrect there
* (air 114) (oir 3) i.e. 2.6% error rate

B pages:
* (aiin 2304) (oiin 69) i.e. 2.9% error rate
* (ain 1403) (oin 18) i.e. 1.2% error rate
* (air 376) (oir 6) i.e. 1.6% error rate

It’s a fact of life that ciphertexts get miscopied (even printed ciphers suffer from this, as Tony Gaffney has reported in the past), so it seems unlikely that the Voynich Manuscript’s text would have a copying error rate as low as 0.1% (i.e. a mistake every 1000 glyphs). At the same time, an error rate as high as 5% (i.e. every 20 glyphs) would arguably seem too high. But if the answer is somewhere in the middle, where is it? And is it different for Hand 1 and Hand 2 etc?

More generally, is there any better way for us to estimate Voynichese’s error rate? Why isn’t this something that researchers are actively debating? How can we make progress with this?

(Structure + Errors) or (Natural Variation)?

This is arguably the core of a big debate that nobody is (yet) having. Is it the case that (a) Voynichese is actually strongly structured but most of the deviations we see are copying and/or transcription errors, or that (b) Voynichese is weakly structured, with the bulk of the deviations arising from other, more natural and “language-like” processes? I think this cuts far deeper to the real issue than the typical is-it-a-language-or-a-cipher superficial bun-fight that normally passes for debate.

Incidentally, a big problem with entropy studies (and indeed with statistical studies in general) is that they tend to over-report the exceptions to the rule: for something like qo, it is easy to look at the instances of qa and conclude that these are ‘obviously’ strongly-meaningful alternatives to the linguistically-conventional qo. But from the strongly-structured point of view, they look well-nigh indistinguishable from copying errors. How can we test these two ideas?

Perhaps we might consider a statistical study that uses this kind of p-value analysis to assess the likeliest level of copying error? Or alternatively, we might consider whether linguistic hypotheses necessarily imply a lower practical bound for the error rate (and whether we can calculate this lower bound). Something to think about, anyway.

All in all, EVA has been a huge support for us all, but I do suspect that more recently it may have closed some people’s eyes to the difficulties both with the process of transcription and with the nature of a document that (there is very strong evidence indeed) was itself copied. Alfred Korzybski famously wrote, “A map is not the territory it represents”: similarly, we must not let possession of a transcription give us false confidence that we fully understand the processes by which the original shapes ended up on the page.