A new day breaks here in the suburbs, bringing with it birdsong, A-road traffic noise, and yet another Voynich theory to bang my head against.

On paper, Professor Stephen Bax certainly has the combination of big brain, linguistic experience and personal ambition that you’d think would be needed to crack open the Voynich Manuscript’s crab-like shell. But… then again, so did poor old Professor William Romaine Newbold; and his Voynich non-decryption ended up enraging Charles Singer so much (justifiably, it has to be said) that he was still angry thirty years later.

All the same, Bax believes that he has tentatively identified a number of words in the Voynich Manuscript, and has posted a 62-page PDF on his website describing his findings. His initial press release has been picked up by BBC News, the Bedfordshire on Sunday, and the irrepressible Daily Grail amongst many others. He has a lecture arranged for 25th February 2014 in Luton (if you happen to be nearby and interested), and is even planning a small Voynich conference in London in June 2014 to try to get other academics involved in his Voynich research programme.

Yet as Rene Zandbergen likes to point out, the most difficult thing about Voynich research is developing chains of reasoning while avoiding big mistakes. And while I hate to be the one to unplug the sound system just as it’s starting to really get the party started, I’m quite certain that every single one of Stephen Bax’s conclusions to date have been built upon a long sequence of easily demonstrable mistakes.

In fact, even though he is trying to use a sensible sounding methodology to elicit his results, I can’t think of a single piece of Voynich Manuscript evidence or secondary historical evidence he uses that I’d agree is a sound starting point: and I’m not convinced that any of his conclusions could be right either. I’ll go through a whole load of points, you’ll see what I mean soon enough.

1. “Initial Words On Herbal Pages Should Be Names”. Errrm…

As Bax rightly points out, you might reasonably expect the unique-looking first word on each of the Voynich Manuscript’s herbal pages to be the name of the plant depicted on that page, because that is indeed how many medieval herbals were laid out. This is not a new observation or idea: Leonell Strong assumed this as part of his Voynichese decryption in the 1940s (he thought the plaintext was written in English, but enciphered using a curious repeating offset into a local substitution alphabet).

But there’s an immediate problem: almost all the Voynich’s Herbal A pages start with one of the four gallows letters: EVA ‘p’ (53 times), EVA ‘t’ (24 times), EVA ‘k’ (21 times) or EVA ‘f’ (10 times). Which for simple substitution ciphers, broadly as John Tiltman pointed out roughly 50 years ago, would mean that the name of pretty much every plant in the Herbal section must start with one of three or four letters. Which would be nonsensical. (Leonell Strong was fine with this, because he thought the cipher scrambled all that stuff up a little: basically, he didn’t think it was a straightforward language.)

Yet Bax persists, and asserts that all of these gallows glyphs simultaneously map to plaintext C or K (in order to keep his ‘oror’ mapping intact, see [3] below), and as a result almost all of the plant names he considers start with the letter C – Centaurea, Cotton, Kaur, Crocus, etc. I’m sorry, but this whole notion is directly contradicted by the immediate statistical evidence. This isn’t something to build on, it’s something to abandon and leave far behind while you find some genuinely useful historical evidence to work with.

2. Bax’s proposed Voynichese alphabet has three letter R’s

This too flies in the face of supposed common sense. The Voynich Manuscript has a limited and compact alphabet, with roughly 18-22 characters occurring with particular frequency: and yet Bax concludes from his multi-language linguistic analysis that three of these (EVA r, EVA m, and EVA n) encipher the letter ‘R’. Come on: this is surely close to as unsystematic a system as could be constructed, a giant Red Flag of Non-Believability being waved in front of his train of reasoning.

3. The Voynichese word “oror” = the Hebrew word “arar”, meaning ‘juniper’

f15v not only has “oror” on, it has “oror or” and “or or oro r” immediately above each other on the first two lines. Did Bax not notice this when he picked this out? This is terribly selective and unconvincing. Moreover, “arar” itself is twice as common in the Voynich Manuscript than “oror”: while Bax himself points out good reasons why it shouldn’t be “oror”.

So… why does he persist with “oror” == juniper? “oror” appears throughout the Voynich Manuscript, while “or” appears extraordinarily frequently. This just seems a hopeful (and unsystematic) stab in the dark in exactly the wrong kind of way.

4. Bax thinks that EVA “kydain” = ‘centaur’ – but has he not noticed “dain” everywhere?

Now this is just ridiculous. One of the genuine mysteries of the Voynich Manuscript is the repeated presence of what look extraordinarily like medieval page references (EVA “aiin”): and here’s one apparently embedded in a word right at the top of f2r. So is there any real chance this also happens to encipher “kentaur” in the way he thinks? No, none whatsoever, I think.

5. “doary” = Taurus. Oh, really?

The reason people have in the past suspected the label by the “Pleiades”-like group on f68r might be “Taurus” was because of the late-medieval “-9” style Tironian nota at its end, preceded by a letter that looks like “r”. But both those correspondences remain a bit of a stretch, and so this seems basically unworkable in the way he hopes.

6. Reading EVA “keerodal” as “coriander”.

Ask people who have been working with the Voynich Manuscript’s “Voynichese” language for a few years and they’ll probably tell you (as I’m saying now) that this word is almost certainly a copying error by the Voynich Manuscript’s scribe. It is extremely rare to see “eer” (while “ar” and “or” are both extremely common), so I’m confident that this should instead have been written “karodal”, which closely matches how the Voynich Manuscript’s “labelese” often parses out in pairs, i.e. “k.ar.od.al”. Hence I have practically zero faith that this word could be a natural language version of “coriander” in the way Bax suspects – he has misparsed and miscategorised it.

7. Relying on Edith Sherwood’s hopeful plant identifications.

Oh, come on. Edith tries hard to do her thing, but remember that we’ve had real herbal authorities (such as the fantastic Karen Reeds) look closely at the Voynich’s herbal drawings, and they haven’t seen even 10% of what Edith Sherwood thinks she has seen.

So, in summary: of the nine words Bax claims (in his Appendix 1) to have identified, I disagree with the evidence, reasoning, and linguistic rationale for every single one. I am also sure that his letter assignments are fatally flawed. Contrary to the title of his paper, I honestly don’t believe that through his efforts he has yet identified a single “plausible” word in the Voynich Manuscript.

For me, this isn’t even a matter for Ockham’s blessed Razor: to be even remotely workable, a hypothesis needs to have a single example of evidence that chimes with it in a way that can actually be seen to work. And on the above showing of evidence, what he has presented so far is not yet a workable hypothesis in any obvious way, sorry.

Over the past week or so, I’ve spent some time patiently going over Fallacara & Occhinegro’s book which tries to connect the Castel del Monte with the Voynich Manuscript. The two guys are clearly intelligent, hard-working architecture historians who have spent several years trying not only to understand the Castel del Monte’s physical construction, but also to reconstruct how it was built and the purposes for which it was designed. But they have additionally posited a connection between the building’s design and numerous key design features found within the Voynich Manuscript, and have made lots of follow-on claims yada-yada-yada.

Hence what I’ll do here is look at their architecture bit first, and then move on to the Voynich layer perched atop their architectonic stuff. OK? Let’s go.

A River Ran Through It

The Castel del Monte in Puglia has been a UNESCO World Heritage Site since 1996, and over the years its unusual physical configuration has attracted numerous fringe theories claiming to explain its many odd features. To what degree have these two authors succeeded in looking past the façade of history and documentation to the actual building underneath?

Actually, I think their architectural research project has been a great success. What emerges from the (admittedly carefully chosen, but nonetheless strongly relevant) fragments of evidence presented in their book is that, contrary to its modern appearance, the site originally was very probably home to a natural spring thought at the time to have health-giving properties. A river even ran close by in previous centuries, as evidenced by the way the area is represented in the Tabula Peutingeriana.

On this location, Frederick II had a curious octagonal edifice built, one too small to be a proper castle but also not really functionally suitable for being a hunting lodge. Fallacara & Occhinegro have picked up on suggestions made by previous architectural historians as well as on numerous physical and archival clues, and have pieced together a reading of the Castel del Monte as a hamam – a restorative Turkish bath complex of the type that at that time was just starting to become fashionable in Europe.

This all aligns with what we find in Pietro da Eboli’s bath-praising poem De Balneis Puteolanis, which has been mentioned on Cipher Mysteries a fair few times, and which arguably helped to start the whole balneological ‘craze’. So up to this point, I don’t see anything at all wrong with the two authors’ reading of the Castel del Monte.

But are they justified in also reading the evidence as of an alchemical obsession by Frederick II? Their evidence in this regard seems to be no more than some circular-shaped stains on the floor, from which they somehow infer alchemical activity on the site. This seems decidedly thin: and I’m fairly certain that the idea of alchemy as promoting eternal life is something that came in many centuries later – in Frederick II’s Europe, chrysopoeia (‘gold-making’) was alchemists’ almost total focus.

This whole idea extends further to spagyria (herb-based alchemy, or herbal medicine made using alchemical-style processes), which as both a term and a practice dates to Paracelsus (much later in the 16th century). I therefore don’t see a way to accept their argument that Frederick II would have designed a building focused on spagyric alchemy with the purpose of retardatio senectutis, because that would simply be anachronistic.

Finally, the authors try to make some play about the 8-sided structure, but I personally see the likelihood of there having been some kind of Platonic or numerological basis for this as basically zero. So-called “sacred geometry” is one of those secret history things that sounds nice in an airport novel, but in almost every case disappears when you look for it in the cold light of day. The Castel del Monte has a nice little design, sure, but… anything beyond that is just too much hand-waving for me to bear.

So, in summary, I like the chain of inference that leads to the Castel del Monte’s being a hamam, at the forefront of the whole balneological fever: but extending this claim to include alchemical or numerological significance seems speculative at best, if not just plain wrong-headed.

All That, And The Voynich Manuscript Too?

Well… no, not really. Given that I don’t accept the link they claim between the Castel del Monte and alchemy or spagyria of any sort, the evidence they present in their book attempting to link the Castel to the Voynich Manuscript is a thin, unnourishing soup indeed.

For example, the image from the book’s cover tries to conflate the (apparently) hexagonal-bodied, round-turreted magic circle page in the Voynich Manuscript with the Castel del Monte’s (very definitely) octagonal-bodied, octagonal-turreted design. Personally, this looks to me no different to other super-selective Voynich theories: really, you have to do better than one partially suggestive image match to back up a claim of a systematic “philological” match between these two very different things.

And similarly for the plants: a palmful of comparisons with carefully selected individual drawings plucked from a broad set of medieval herbals really isn’t methodologically good enough. The bigger problem with comparing the Voynich Manuscript with medieval herbals is that quite a few of its drawings are apparently drawn from life, a practice which happened before and after the Middle Ages (if after, say 1425 or so), but not really during them.

The authors are also aware that it is a long way back from the (early 15th century) radiocarbon dating to the (early 13th century) court of Frederick II (the Castel’s Decretio Regis dates from 1240), and so conclude, unsurprisingly, that it must have been copied by a later dumb copyist etc etc. There are indeed a number of codicological features that suggest that the Voynich Manuscript was in some way a copy.

But there are many problems with a 13th century dating for its original content, which is why nobody has seriously re-proposed Roger Bacon as its author for several decades now. Never mind the 15th century stuff I keep going on about, the crossbow technology depicted in the Sagittarius archer’s hunting crossbow points to “the first half of the 14th century”: while Erwin Panofsky famously opined “as he came to the female figures (in conjunction with the colors used in the manuscript) he came to the conclusion that it could not be earlier than the 15th century“. The hair-styles and clothes (such as they are) are all thought to be 15th century (or possibly later) – which is an inexact method of dating, sure, but it really should be good for the nearest century.

I also don’t buy into their ideas about “proto-toilets”: having read numerous earnest-sounding books on the secret history of toilets over the last decade (I kid you not, and recommend Lawrence Wright’s (1960) “Clean and Decent”), I really don’t think 13th century engineers were even remotely close to getting that nailing that tricky jelly to the garderobe wall. Yes, they did have limited water engineering and hypocausts: but my own reading is that toilets only became a plumbing possibility once Vitrivius had been revived in the 15th century. So that suggestion doesn’t work for me either.

Hence I think it’s going to take a lot of saving hypotheses (mainly around embellishing copyists, rather than time-travelling Gallifreyans) to pull a 13th century dating back from the cliff-edge sheer drop its feet are pedalling rapidly over, Wile E. Coyote-style. And while that’s still possible, it’s not very likely on this showing.

I don’t know, really. Fallacara and Occhinegro were very kind to send me a copy of their book, and I do wish them luck with their ongoing research into the Castel del Monte, which offers a reasonably solid hamam-based angle on a nice and genuinely mysterious piece of Puglian tourist history. But I can’t even remotely endorse the 13th century Voynich story they want to tell (which will probably come as no great surprise to them): unfortunately, it mars what is otherwise a perfectly nice (if fairly specific) piece of architectural / balneological history.

I think the simple truth – or as close as we can get to it without going excessively TL;DR – is that the Castel del Monte was on the leading edge of European nobility’s obsession with thermal baths, while the Voynich Manuscript was far closer to its trailing edge. Fifty or a hundred years yet further on, baths were thought (wrongly) to be the cause of syphilis and all kinds of other STDs: and so the whole craze abruptly stopped, with baths (and books about baths, which flourished in the 15th century) falling rapidly into disrepair. Perhaps the last century’s craze for unsupportable Voynich theories will abruptly stop some time in the future too? Well… I can dream, can’t I?

I was deeply saddened this week to find out that Mark Perakh died last year, on 7th May 2013 in Escondido, Calfornia. He wrote with such vitality I never even stopped to consider his age: but he was in fact 88.

perakh2

Perakh’s was a life of three professorial acts: first in Russia, then in Israel, and then finally in America. It seems that Perakh was goaded most frequently into action by a drive to resist that which he considered false knowledge – for him, dissenting sincerely meant fighting.

In recent decades, the things that goaded him to greatest action were the grand pseudoscience and pseudohistory constructions of fundamentalist Christian literalism: specifically, the Bible Codes (don’t get me started on that, or I’ll be typing all night) and literal Creationism. His book “Unintelligent Design” surely forms as good a sustained counterargument as needs to be written to the pro-creationist arguments of William Dembski et al.

Back in the world of cipher mysteries, for a short while Perakh brought his mathematical and statistical heavy guns to bear on the Voynich Manuscript’s confounding ‘Voynichese’ text: and his exemplary 1999 paper “APPLICATION OF THE LETTER SERIAL CORRELATION TEST TO THE VOYNICH MANUSCRIPT” is something I often suggest that researchers take a look at.

Unfortunately, since 2011 all the copies of it outside the Wayback Machine seem to have withered on the virtual vine: so I thought I’d take this opportunity to praise the man and resurrect his paper here on Cipher Mysteries, for anyone with an interest in statistical studies of the Voynich Manuscript.

So, here’s part 1 (his experimental tests and raw data) and part 2 (his conclusions): highly recommended stuff!

Incidentally, until just now I’d forgotten that Mark Perakh also ran his LSC (Letter Serial Correlation) tests on Gordon Rugg’s generated Voynichese-like text: and that it produced results that were close to those returned by the artificial gibberish text mentioned in Perakh’s paper, and quite unlike those yielded by Voynich A or B texts (which are very close to those characteristic of proper languages). In an online comment from 2004, Perakh expressed disappointment that Rugg had felt the need to gild his experimental lily for publication in Scientific American.

In case you’ve arrived late to the linguistics party, abjad is a term used to describe a writing style for a language (primarily) made up from consonants, where the reader is required to fill in the unwritten vowelled gaps for himself/herself. Perhaps the best-known example of this is the modern Arabic script, from the first four letters of whose alphabet the term “abjad” comes – in fact, it’s the Arabic word for “alphabet”.

So… might Voynichese be written in an abjad writing style?

Freelance systems analyst Joachim Dathe thinks so: inspired initially by the apparent similarity between the Voynich Manuscript’s (occasionally ornate) script and Arabic calligraphy, for the last few years he has been promoting and refining his theory that Voynichese is nothing more than Arabic written in an apparently unique (and rather idiosyncratic) abjad stylee.

Yet at the same time, Dathe also believes that the Voynich’s Arabic plaintext can only be extracted with difficulty, because in his particular Arabic reading of it:-
* Punctuation is absent
* Sentence structure isn’t at all obvious
* Word boundaries are often inexact or missing
* Spaces are often inserted inside words
* “Words often appear […arranged or ordered…] in a way which is not compliant with the Arabic language
His overall conclusion: “Obviously, the texts were dictated to a writer who did not master Arabic scripts.

For example, Dathe and his translator collaborator admit that their transliteration of the start of f1r yields a fairly jumbled (if not actually random) set of Arabic words, and offers the following interpretative translation of it (though naturally only one of many possible):-

A dervish continues to Elate, believing that he is forgotten, and when I am surrounded by his presence, I am in Eden. I am a naught in his life. When despaired of Iman Taha (the faith of The Prophet Peace be upon him), he was purified by an illusion, this is what my faith has inspired me yesterday. I see it distantly in the image of my mother. Do we blame he who offered his life? If you deny him you pierce my eyes, and if you embrace him your excuse will be realized.

Now, claiming a Voynichese abjad decryption that proves unrelated to the drawings and imagery (in Dathe’s case, of “religious content from Sufism”) isn’t unique: John Stojko’s (in)famous vowel-free proto-Ukrainian Voynich decryption of f18r – “What slanted Oko is doing now? Perhaps Ora’s people you are snatching. I was, I am fighting and told the truth. Oko you are fighting mischievously (evil manner). Ask this. Are you asking religion for your clan?” – springs to mind.

Of course, this comparison is hardly breaking news: Elmar Vogt noted much the same similarity in 2012, though going on to compare both sets of mangled-sounding plaintexts with Vogon poetry was perhaps a teensy bit harsh. Still, I do find it hard to disagree with Elmar’s sentiment that Dathe’s “approach is flagrantly naïve”: if there is a real, tangible difference between the way Stojko and Dathe both approached Voynichese, I certainly can’t see it. And if one is wrong for that reason, then so surely is the other.

(Remember: the long-established template for bad Voynich theories is (a) to conjure up a simple-sounding explanation, and then (b) to wrap that up in a long series of what are known as “saving hypotheses” – additional weasel-like meta-explanations that serve to explain away conflicts between that wonky core explanation and an inevitably long succession of inconvenient historical truths. Voynich theorists like to think of themselves as following in the giant decrypting footsteps of Young, Champollion, Ventris et al: but none of that august list put forward theories that needed extensive sets of saving hypotheses to explain away contingent problems.)

In many ways, though, simply grabbing hold of a given abjad script (whether Arabic or vowel-less proto-Ukrainian, if such a thing ever genuinely existed) as a starting point for decrypting the Voynich is without much doubt a poor way to proceed. The proper first question is instead this: what is the linguistic evidence that Voynichese is a script that has no vowels?

Linguists have long exercised their cunning (if you’ll excuse the reordered juxtaposition) by running text corpora through consonant-vowel analysis programmes: basically, they’re looking for hidden Markov models (HMM) with a small number of vowels that constantly recur without leaving consonants adrift in blocks (known as CVCV structure).

Reddy and Knight reported:-

[Jacques] Guy (1991) applies the vowel-consonant separation algorithm of (Sukhotin, 1962) on two pages of the Biological section, and finds that four characters (O, A, C, G) separate out as vowels. However, the separation is not very strong, and several words do not contain these characters.

At the same time, when they ran their own 2-state bigram HMM programme on Voynichese, the only feature they noted was the strong binding between the final letter of words (typically EVA ‘y’) and the space following it: which model they thought similar to Arabic script. So… it is Arabic, then?

Well… no. What this actually means is that a 2-state bigram HMM is woefully inadequate for analysing EVA-transcribed text. Essentially, EVA is a stroke transcription rather than a glyph transcription (hence many composite shapes are transcribed in two or three strokes): and so should never be used as the “raw” input to a statistical analysis programme. So they wasted their time using a 2-state bigram HMM: not even close. (Even if they didn’t use EVA, I would argue that a 2-state bigram HMM is thoroughly unsatisfactory for numerous other reasons, most of them connected with the behaviour of the EVA letters ‘a’, ‘e’, ‘i’, and ‘o’.)

In fact, arguably the fundamental statistical paradox about Voynichese as a script is that while it is riddled (quite literally, I suppose) with multiple overlapping internal structures, analysts have had very little luck building up Markov models to describe its behaviour; all of which is really quite the opposite of how you’d expect a well-formed language’s script to present. Even Jorge Stolfi’s long-standing “crust-mantle-core” model falls well short of being properly explanatory about the text. So, if Kevin Knight wants something Voynichian for his 2014 summer interns to get their teeth into, surely building up properly substantial Markov models for Currier A and Currier B (oh, and labelese too) would be an excellent starting point. Sort that out and we should all be sharing turkey and pepperoni pizza by Thanksgiving. 🙂

Jacques Guy applied Sukhotin’s algorithm to a glyph transcription, and so stood a better chance of getting sensible results than Reddy and Knight: yet I think the patterns in the text tell us a very much more complicated historical story than is captured by either of these two analytical tracks.

On the one hand, I think it is plain as day that we (the Voynich Manuscript’s ‘audience’, so to speak) are supposed to ‘read’ Voynichese in part as if it were a CVCV structured (non-abjad) thing. Look at the Pisces labels: these not only have a strong CVCV structure, but 25 out of the 30 also begin with the letter ‘o’ (presumably followed by a consonant, usually a ‘t’ or ‘k’ gallows character):-

otalal / otaral / otalar / otalam / dolaram / okaram / oteosal / salols / okaldal / ykolaiin / sar.am / oty / oky.ody / oty.or / okaly / otody / otald / otal.dar / okody / opys.am / chckhhy / otaly / otal.rar / otal.dy / okeoly / okydy / okees / otalalg / okasy / otar

There is also the heavy repetition of ‘or’, ‘ar’, ‘ol’ and ‘al’ throughout the text to consider, especially in phrases such as “or oro ror”. Once you visually ‘tune in’ to this kind of pairing, I think it becomes hard not to see the text as largely CVCV structured.

On the other hand, I think it is very nearly as plain that there’s something terribly wrong with this CVCV model of Voynichese. The simplest objection is that if it is correct, then only ‘o’ and ‘a’ seem to participate in CVCV structured words, making Voynichese a vowelled language with only two genuinely combinable vowels. Which would be a nonsense, right?

So if you think the Voynichese script is directly expressing an actual natural language, you’re stuck halfway between two extrema, because it’s neither consonanty enough to be an abjad (unvowelled) script, nor vowelly enough to be a proper abugida (vowelled) script. It’s a paradox, right?

Hence I personally think the only sensible conclusion is that Voynichese is a script that is neither an abjad nor an abugida, but is instead a covertext designed to resemble a plausible-looking language script (albeitone with too few vowels to register solidly as either category). The cryptographic truth falls between these either-or categorical boundaries erected by linguists, and in a much more subtle and devious way than linguists’ tools are able to handle comfortably. Good isn’t it?

Indeed, “There are more things in heaven and earth, Horatio / Than are dreamt of in your philosophy.

A quick update on yesterday’s Willen Styn post.

Debra Fasano very kindly took a second look at the form I received, and her sharp eyes picked up everything I missed. In her words:-

The Port Albany was a cargo vessel and didn’t normally carry passengers so I think he was more likely a fireman/trimmer onboard the ship. The document was filled out when the ship arrived in Fremantle and the “place of abode (abroad)” would be Penarth in Wales.

There are not many non-immigrant ship arrivals which are indexed so for cargo ships like this you would need to go to State records in real life. NSW is the only State that I know of which is indexing and digitising the manifests of all ships great and small.

The month by month is pretty much complete to 1900 but after that it gets a bit patchy, however they are all online at Ancestry. On that page there is also a link to the shipping arrivals index into Sydney and as many ships went to all ports from WA to Queensland, I checked the 1919 voyages into NSW and a fireman listed as W. Styne (or whatever!) aged 34 from Holland does turn up in 1919; someone obviously had his age wrong.

The August arrival is from New York via Adelaide (and Fremantle where the form was filled out), and the September arrival into Sydney is from Bowen and Townsville so they certainly got around.

It is quite possible that he didn’t set foot on Australian soil.

I also had an independent email follow-up from “Cymroz”, who correctly pointed out the existence of “Lord St in Penarth, near Cardiff, where his ship came from“. Thanks for that too! I think that this all hammers a sufficiently large number of nails into that thread’s coffin. Still, I’d rather know for sure it’s not him than not know at all.

One last thing: a few weeks ago, I drew up a list of all the partially open leads I could see in the Somerton Man case that I thought stood any chance of yielding anything genuinely productive. By far the best of these was trying to better understand the story behind the “Jestyn” signature: but without any “Mr Styn” to pursue in the archives, I’m now very nearly out of ideas.

Might a quite different Mr Styn / Stijn have been a patient at Royal North Shore Hospital in 1942/1943/1944? As I recall, there was a single newspaper report which said that the nurse had given a copy of the Rubaiyat to a patient: as always with journalists, that could very well have been misheard, miscopied, misreported or invented, but right now I can see very few archival avenues left to check.

Unfortunately, according to this page, it seems as though RNSH patient records are archived only back as far as 1963. Still, it might well be worth contacting the Assistant Medical Records Manager, archives can have all kinds of odd secondary records (admission books, etc).

A splendid “Do Not Bend” document envelope arrived here a few minutes ago (courtesy of the lovely people at the National Archives of Australia), containing the Form of Application for Registration #24041 for a certain ‘Willen Styn’ I mentioned a few days ago.

Alas, cutting straight to the chase, he’s not our Unknown Man: though he had grey eyes and was of medium build, he was only 5′ 7½” tall and had – definitively enough – a quite different left thumb-print (assuming the fingerprint chart on p.207 of Gerry Feltus’ “The Unknown Man” is correct 🙂 ).

According to the form dated 17th July 1919, this Dutchman was born in Amsterdam in 1894; signed his name “W. Stijn” (which presumably Aliens Registration Officer Hewitt miscopied or misheard as “Styn”); had arrived on the ship Port Albany from Cardiff; was working as a fireman; and lived at “15 Lord St, Penarith” (which doesn’t seem to exist, so I suspect should actually have been ’15 Lawson St, Penrith’), not too far from Penrith’s present-day Museum of Fire (one hour west of Sydney).

From all the other apparent typos on this single page form, I’d also guess he will turn out to be “Willem Stijn”. But regardless, he’s not our (unknown) man, I just thought you’d like to know. Oh well! 🙁

Incidentally, there seem to be good archival records of NSW firemen 1884-1955, so there may be more about firefighter Stijn in the Personnel record books in Western Sydney Records Centre in (dare I say it again) Kingswood. Let’s just hope it doesn’t come to that, eh? :-p

I’ve got a lot of time for Dominic Selwood: his 1999 (non-fiction) book Knights of the Cloister: Templars and Hospitallers in Central-Southern Occitania, c.1100-c.1300 painted a detailed, evidence-based picture of the Knights Templars across a properly historical and social background. It is not, as he points out, “light relaxing reading”: but remains a fine counterpoint to the more militaristic / political / conspiratorial accounts of the two Orders, well worth looking at.

Just so you know: back in 2001, I went to a lecture of his at the (now long-slumbering) Canonbury Masonic Research Institute, and later asked him by email about Templar artificial alphabets that were used for signing their proto-‘cheques’. Unfortunately, he replied that “All my notes from my research were thrown away by accident by the staff where I used to work“, (though he may still have some images on slides).

At that time, he had just completed a PhD at Oxford (and occasionally played in a band called The Binmen!) and was then starting the work as a barrister that would occupy him for the next five years: and so I was intrigued to discover a few weeks ago that he had just published a novel called The Sword of Moses (the Kindle version is currently only 62p, which is a steal).

the-sword-of-moses-cover

Oddly, this also seems to have necessitated drinking some Johnny Depp-stylee potion, as can be seen from the dramatic physical transformation he has undergone:-

Dominic-Selwood-as-was

Dominic-Selwood-as-is-or-perhaps-his-evil-twin

Either that or the novel was written by his evil twin, it’s hard to tell. 😉

Anyway, if you even remotely know how books in the historical-artefact/modern-threat/sassy-hero genre run, you’ll be at home immediately (think Ark of the Covenant, international mercenaries, bombs, assault weapons, etc): and the main character (Dr Ava Curzon) is a kind of passive-aggressive ‘Jane Bond’ / Lara Croft ex-spook-now-sassy-archaeologist hybrid, probably with half an eye towards Angelina Jolie in the film version (as per normal). And if you can find a genuinely empathetic or believable character anywhere in the mix, you’re far more observant than I am: but that’s hardly much of a criticism, as it is industry standard fare for the genre.

The most obviously notable feature, though, is the sheer scale of the book. Not just the chunky page count (792 pages!), but it is very much as though Selwood has collided two or three already biggish novels together, and welded the wreckage together into a fatter, lumpier composite: Ethiopian churches, Iraq, Knights Templar (who, it seems, are still going strong, bless ’em), Masons, MI6, Russian gangsters, Israeli spies, London/Kent Neo-Nazis, necromancy, John Dee, etc all play their respective part (though the Voynich Manuscript only gets a cameo, it has to be said): and even dear old Aleister Crowley gets more than a nod.

Really, this all comes across to me as though Selwood’s Writing Ambition was in a gladiatorial fight to the death against Editorial Control, where only the former was wielding a sword. By which I mean that even though he writes pretty well, whenever his story’s fire starts to flicker a little, he anxiously hurls yet more geopolitics and history onto its flames: but that rich burning smell ultimately comes across as one of insecurity, not of confidence.

For me, though, the most interesting feature of “The Sword of Moses” was the history – the book clearly sits atop the heaped spoils of Selwood’s lifetime’s connoisseurship of alt.history strands. And what I think transforms the whole enterprise into something epic is something that I think emerges from the text only indirectly: his personal micro-crusade against junk history.

Honestly, he seems to be saying, why do novelists invest so much time filling their genre books with historical nonsense, when the real deal is even more excruciatingly complex and intriguing, if you just bother to get your stupid superficial noses out of Wikipedia? And so he goes out of his way to get the history properly right, again and again: mightily impressive, densely entertaining, heavily intertextual stuff.

And so when it comes to the idea that forms the historical backbone of his novel, it’s his idea of a proper shocker: that the Old Testament has polytheism and even ritual sacrifice embedded in it (which is indeed entirely true). But… but… but… this is also where it all goes a bit Pete Tong.

The horrible, dull truth is that exposing the ritualistic layers codicologically embedded in plain sight within The Bible (and having a super-evil necromancer to bring them to some kind of twisted life) just isn’t much of a surprise any more. The Dan Brown sincere flattery crowd (as in “imitation is…”) of novelists have kind of strip-mined the genre: and for all their dodgy historical faults, in the list of their crimes against readers Bad History comes a distant third behind Empty Characterization and Mile-Wide Plot Faults.

So… while I like Dominic and have terrific admiration for his historical sensibilities and indeed writing ambition, I finished his book feeling that he set out on his novel-writing quest to solve the wrong problem. Having myself read far too many books in this genre over recent years than is properly healthy, I’d agree that he really isn’t tilting at windmills – that Bad History is an endemic problem in fiction in general. But he’ll have to work somewhat harder with his next novel to help readers care whether Dr Ava Curzon lives or dies, because frankly I never quite managed that piece of reading magic, sorry. 🙁

A few days ago, I had a nice email from two Swedish engineers called Henrik (Henke) Sundberg and David Thelin: surprisingly, they claimed that they had worked out the details of the Zodiac Killer’s 32-character “map cipher” (also known as “Z32”).

The first thing I did was to put up a new page describing the Z32 cipher, something I’ve been meaning to do for a few years: as normal, I tried to cover the raw factuality and basic observations rather than out-and-out theories and speculation.

The short version is that the letter-shapes in the Z32 cipher look nearly exactly the same as the (famously solved) Z408 cipher, which makes it seem very much as though it too is a homophonic cipher, though with different letter assignments (deciphering it using the Z408 key doesn’t seem to yield anything sensible). Unfortunately, 32 characters (made up of 29 different shapes, i.e. only three appear more than once) wouldn’t normally be anywhere near large enough for a homophonic cryptogram to be cracked, unless you had some significant additional information to work with. (Hint: a cipher key would be a good start. 🙂 )

However, in this case there was some other extra information: a roadmap of the San Francisco Bay Area with a “Zodiac Killer” shape centred on Mount Diablo, and a note saying “The Map coupled with this code will tell you where the bomb is set. You have untill next Fall to dig it up“. A second “little list” letter (posted a month later) give a further clue: “PS. The Mt. Diablo code concerns Radians 4#inches along the radians“.

Sundberg and Thelin’s theory (described in this PDF file) is that it’s in fact a very scientific cipher, as much a stegotext as a cryptogram.

Z32-cipher

From this, they extract the phrases “C3H3”, “Octane”, and “North of West”, while “HCEL(Zodiac)PW(triangle)” reminds them of how the molecule HClO3 looks, centred around the Zodiac symbol. From which they deduce that they need to look 1 inch (i.e. 6.4 miles) along a vector due West from magnetic North.

Guys, guys… I’m really sorry, but I think you’ve got it wrong. Nobody in their right (or indeed wrong) mind would concoct a chain of reasoning based around a vague resemblance to a particular molecule in order to encode a unit vector. Even dear old Jessica Lee wouldn’t do that, much as she likes chemistry and ciphers.

Look: the Zodiac Killer wasn’t some evil scientific genius, he was a sick, unhappy man with a grudge against the SFPD (probably a surrogate for his sick unhappy relationship with his abusive, distant father) on a gun-powered external power trip, a (literally) vain attempt to right the perceived wrongs in his personal life. I don’t even think he knew properly what a “radian” is, because he doesn’t use the term correctly in his note.

A pragmatic starting point for the d’Agapeyeff cipher is to sequentially replace its digit pairs with letters, i.e.

** .1 .2 .3 .4 .5
6. _0 17 12 16 11 --> A B C D E
7. _1 _9 _0 14 17 --> F G H I J
8. 20 17 15 11 17 --> K L M N O
9. 12 _3 _2 _1 _0 --> P Q R S T
0. _0 _0 _0 _1 _0 --> U V W X Y

If you then “re-flow” those letters into a 14×14 grid, many of its oddities are to be found in the final right hand column:-

[ 0] J B L O P B P D K D P I O N
[ 1] D I I L N M K C K K I I L B
[ 2] D J M L N P J I E M J J J R
[ 3] C E E K C K J O J J D B L Q
[ 4] O I C L J I M K E K N O D O
[ 5] D O O C L G B M B K K G K D
[ 6] C J L K D M C L O K C C C X
[ 7] I K P P N C O N E D O E B S
[ 8] B B O P O P I P G J D E J F
[ 9] E M B D I K L N B L D P K R
[10] E B D N N P M O I P K E G I
[11] M M O L M D B G B E B M J Q
[12] G C L L G G M L O N J L K M
[13] G N B L M J K D J I O K B Q

The ‘X’ (’04’) right at the end of row #6 is highly suspicious: at least one person before me has suspected that this might somehow be a padding ‘X’ appended to the end of the (pre-transposition-stage) plaintext to bring it up to a 14×14 multiple.

However, I think that the three ‘Q’ (’92’) symbols in the same rightmost column are even more suspicious: this symbol occurs exactly three times in the cryptogram, and only ever in this column. I think these are even more likely than the ‘X’ to be the final three letters of the plaintext, appended to pad it up to 14×14 = 196 characters in length.

In fact, I’m now almost certain that the correct starting point for cryptanalysis should be the diagonal transposition of the 14×14 grid, which transformation would flip all these oddities across onto the bottom (final) row of the transposed grid, leaving (presumably) a 14-column transposition cipher to solve:-

[ 0'] J D D C O D C I B E E M G G
[ 1'] B I J E I O J K B M B M C N
[ 2'] L I M E C O L P O B D O L B
[ 3'] O L L K L C K P P D N L L L
[ 4'] P N N C J L D N O I N M G M
[ 5'] B M P K I G M C P K P D G J
[ 6'] P K J J M B C O I L M B M K
[ 7'] D C I O K M L N P N O G L D
[ 8'] K K E J E B O E G B I B O J
[ 9'] D K M J K K K D J L P E N I
[10'] P I J D N K C O D D K B J O
[11'] I I J B O G C E E P E M L K
[12'] O L J L D K C B J K G J K B
[13'] N B R Q O D X S F R I Q M Q

Here I’ve highlighted the two tripled letters (“LLL” on row #3′, and “KKK” on row #9′): here LLL is on a row with 6 L’s (so it’s hardly surprising that it ended up as a tripled letter post-transposition), while KKK is on a row with 4 K’s. Here are the overall letter instance counts for the cryptogram:-

.K .B .J .L .O .D .M .I .C .P .E .N .G .Q .R .F .S .X
20 17 17 17 17 16 15 14 12 12 11 11 9 3 2 1 1 1

It’s interesting to compare this set with the letter frequency table of the text mini-corpus taken from d’Agapeyeff’s “Codes and Ciphers” (which I also generated recently). If you normalize that to 196 characters, here’s what you would expect to see in the cryptogram:-

.E .T .A .I .O .S .N .R .H .D .L .C .U .M .F .P .W .G .Y .B .V .K
25 18 15 14 14 14 13 12 11 .8 .7 .6 .5 .4 .4 .4 .4 .4 .3 .3 .2 .1

From this, it looks as though K probably –> E, while B/J/L/O/D seem likely to go to T/A/I/O/S. What I’m thinking here is that if this is right, all we need to solve it is to generate a moderate number of best-guess substitution values and feed those into a transposition cipher solver, i.e.:-
(a) guess that K –> E
(b) generate the 5! = 120 permutations of B/J/L/O/D –> T/A/I/O/S
(c) assign plausible values to the remainder of the used letters (in matching descending frequency order)
(d) feed the 120 versions of the transposed 14×14 grid into a reliable columnar transposition solver

My prediction is that even though this will still be wrong, getting the 6 most popular letters right (i.e. 20 + 17 + 17 + 17 + 17 + 16 = 104 characters, ~53% of the cryptogram) and possibly some of the others (by chance) will allow the transposition solver to get us close enough to the answer, that we can tell from its output what the correct transposition order is. Does that sound reasonable?

PS: if the final row is partly artificial, it may be a good idea not to feed that into the transposition solver, i.e. only try to solve a 14×13. Incidentally, a very good freeware cipher solver Windows application is CryptoCrack, but more about that another day… 🙂

I’ve been trying to break the d’Agapeyeff challenge cipher this week, a process that I (along with several other cipher commentators, although opinions differ etc etc) strongly suspect will involve solving a 14×14 transposition cipher and a substitution cipher simultaneously.

A plausible-sounding way to try to do this would be to model the distribution of digraph frequency counts in English texts, and then for a given transposition compare an ordered table of its digraph frequency counts against that model. However, when I tried this with some test text (taken from d’Agapeyeff’s book), the English digraph frequency values given on the Internet weren’t even close.

I initially looked at getting a corpus of British English text to generate a proper digraph frequency table: but that proved to be difficult and expensive, with the bother of licenses and licence fees to deal with. But then I thought… why not use d’Agapeyeff’s book “Codes and Ciphers” itself as the corpus? Sure, it’s on a much smaller scale, but it would surely be more statistically representative of the cryptogram’s plaintext than the complete works of Shakespeare (which are often included in English corpora, presumably on the principle of what-the-heck-let’s-throw-it-all-in-can-it-really-hurt?).

Even though the book’s text looked nice and clean to my eye, OCR’ing it turned out to be completely unsatisfactory: and so I was delighted to find a page put up by regular Cipher Mysteries commenter Menno Knul containing a lot of text from “Codes and Ciphers” (thanks Menno!). After a bit of tweaking (fixing some typos, removing foreign language quotes, removing confusing cipher / code passages, etc), I then ended up with a reasonably workable d’Agapeyeff mini-corpus to plug into a trivial C digraph-counting programme.

So, here are d’Agapeyeff’s top 50 digraphs from the text of “Codes and Ciphers” (but with spaces, punctuation, spaces and numbers removed), together with their frequency percentages in descending order. I’ll be using this table before very long to try to break his cipher, fingers crossed they’ll do the trick!

TH,3.23744%
HE,2.80072%
IN,2.03171%
ER,1.89246%
AN,1.50321%
ES,1.41460%
RE,1.39245%
ON,1.21523%
NT,1.20257%
ED,1.20257%
ST,1.19624%
EN,1.14561%
SE,1.10447%
EA,1.08548%
TE,1.05383%
TI,1.04750%
ET,1.02851%
ND,1.01269%
IS,0.99370%
OF,0.98104%
TO,0.95889%
OR,0.94940%
AT,0.92725%
AS,0.92725%
IT,0.87978%
HI,0.83547%
LE,0.82598%
NG,0.81648%
AL,0.81648%
HA,0.80699%
AR,0.80699%
SA,0.73104%
SI,0.71838%
VE,0.70255%
RI,0.69623%
CO,0.69306%
SO,0.68673%
ME,0.67724%
EC,0.67407%
DE,0.66774%
RA,0.60129%
RS,0.59496%
RO,0.59179%
DI,0.59179%
TT,0.58546%
OU,0.58546%
TA,0.58230%
BE,0.57597%
US,0.54432%
IC,0.52850%