This, you may be a little surprised to read, is a story about a “two-piece bustle dress of bronze silk with striped rust velvet accents and lace cuffs“, with original Ophelia-motif buttons. Maryland-based curator and antique dress collector Sara Rivers-Cofield bought it for a Benjamin from an antique mall around Christmas 2013: but it turned out – marvel of marvels – to have an odd-looking ciphertext concealed in a secret inside pocket.

In early 2014, German crypto-blogger Klaus Schmeh threw this puzzle to his readers to solve, not unlike a juicy bone to a pack of wolves. However, their voracious code-breaking teeth – normally reliable enough for enciphered postcards and the like – seemed not to gain any grip on this silk dress cipher, even when he revisited it a few days ago.

So… what is going on here? Why can’t we just shatter its cryptographic shell, like the brittle antique walnut it ought by all rights to be? And what might be the cipher’s secret history?

First, The Dress

It’s made of nice quality silk (and has been looked after well over its 130-odd year lifetime), so would have been a pricey item. The buttonholes are hand-stitched (and nicely finished), yet much of the other stitching was done by machine.

This alone would date the item to after 1850 or so (when Isaac Singer’s sewing machines began to be sold in any quantity). However, Sara Rivers-Cofield dates it (on purely stylistic grounds) to “the mid-1880s”, which I find particularly interesting, for reasons I’ll explain later.

All we know about its original owner, apart from a penchant for hidden ciphers, is her surname (“Bennett”) and her dress size. We might reasonably speculate (from the cost and quality of her silk two-piece) that she was somewhere between well-to-do and very well off; and perhaps from a larger city in Maryland (such as Baltimore) where silk would be more de rigueur; and possibly she wasn’t much beyond her mid-20s (because life expectancy wasn’t that good back then).

Who Might She Be?

It doesn’t take much web searching to come up with a plausible-sounding candidate: Margaret J. Bennett, “a dowager grand dame of Baltimore society” (according to the Baltimore Sun) who died childless in 1900, leaving $150,000 to endow a local trust to provide accommodation for homeless women.

Among Baltimore architectural historians, she is also remembered for the Bennett House at 17 West Mulberry Street: there, the land was purchased by F.W. Bennett (who was the head of his own Auction House in town), while the house was erected in 1880.

Anyway, if anyone here has access to American newspapers archives or (though I have in the past, I don’t at the moment), I’d be very interested to know if they have anything on Margaret J. Bennett. I didn’t manage to find any family archives or photographs online, but hopefully you cunning people can do much better.

Of course, there may well be many other Mrs Bennetts who also match the same basic profile: but I think Margaret J. is too good a catch not to have at least a quick look. 🙂

Now, The Silk Dress Cipher Itself

What Sara Rivers-Cofield (and her mother) found hidden inside the silk dress’s secret inner pocket were two balled-up sheets of paper (she called them “The Bustle Code”):

Within a few seconds of looking at these, it was clear to me that what we have here is a genuine cipher mystery: that is, something where the cryptography and the history are so tangled that each obscures the other.

Curiously, the writing on the sheets is very structured: each line consists of between two and seven words, and all bar three of these have the number of words written in just below the first word. So even when text wraps round, it appears that we can treat that whole (wrapped) line as a single unit.

Also oddly, the writing is constrained well within the margins of the paper, to the point that there almost seems to be an invisible right-hand margin beyond which the writer did not (or could not) go. It therefore seems as though these sheets might be a copy of a document that was originally written on much narrower pieces of paper, but where the original formatting was retained.

Another point that’s worth making is that the idea of using word lists for telegraphy (and indeed cryptography) is to keep the words dissimilar to each other, to prevent messages getting scrambled. Yet here we appear to have words very similar to each other (such as “leafage” and “leakage”), along with words that seem to have been misheard or misspelt (“Rugina” for “Regina”, “Calgarry” for “Calgary”, etc).

To me, this suggests that part of the process involved somebody reading words out loud to someone writing them down. Hence I’ve attempted to correct parts of my transcription to try to bring some semblance of uniformity to it. (But feel free to disagree, I don’t mind).

Interestingly, if you lay out all the words in columns (having unwrapped the word wrapping), a number of striking patterns emerge…

The Column Patterns

Where the codetext’s words repeat, they do so in one of three groups: within the first column (e.g. “Calgarry”), within the second column (e.g. “Noun”), or within the remainder (e.g. “event”). In the following image, I’ve highlighted in different colours where words starting with the same letter repeat from column three onwards:

Moreover, the words in the first column are dominated by American and Canadian place names: although (just to be difficult) “egypt” and “malay” both appear elsewhere in the lines.

The third column is overwhelmingly dominated by l-words (legacy, loamy, etc): generally, words in the third to seventh columns start with a very limited range of letters, one quite unlike normal language initial letter distributions.

Indeed, this strongly suggests to me that the four instances of “Noun” in the second column are all nulls, because if you shift the remainder of those words across by one column, “laubul” / “leakage” / “loamy” / “legacy” all slide from column #4 back into the l-initial-heavy column #3.

It seems almost impossible at this point not to draw the conclusion that these words are drawn from lists of arbitrary words, arranged by first letter: and that without access to those same lists, we stand no real chance of making progress.

All the same, a commenter on Sara Rivers-Cofield’s blog (John McVey, who collects historical telegraph codes, and who famously – “famously” around here anyway – helped decode a 1948 Israeli telegram recently) proposed that what was in play might be not so much a telegraphic code as a telegraphic cipher.

These (though rare) included long lists of words to yield numerical equivalents, which could then be used to index into different lists (or sometimes the same list, but three words onwards). Here’s a link to an 1870 telegraphic cypher from McVey’s blog.

However, from the highly-structured nature of the word usage and repetitions here, I think we can rule out any kind of formal telegraphic code, i.e. this is not in any way a “flat” words-in-words-out code substitution.

Rather, I think that we are looking at something similar to the semi-improvised (yet complex) rum-runner codes that Elizebeth Friedman won acclaim for breaking in the 1920s and 1930s: strongly reliant on code lists, yet also highly specialized around the precise nature of the contents of the communication, and using amateur code-making cunning.

That is, the first two columns seem to be encoding a quite different type of content to the other columns: the l-list words seem to be signalling the start of the second half’s contents.

Were Other People Involved?

I’ve already suggested that the words on the two sheets were copied from smaller (or at least narrower) pieces of paper, and that as part of this someone may well have read words out for someone else to copy down (because spelling mistakes and/or mishearing mistakes seem to have crept in).

However, someone (very possibly a third person) has also apparently checked these, ticking each numbered line off with a rough green pencil. There are also underlinings under some words (such as “Lental”), not unlike a schoolteacher marking corrections on an exercise book.

Yet once you start to get secret writing with as many as three people involved, the chances of this being an individual’s private code would seem to sharply reduced – that is, I think we can rule out the possibility that this was the delusional product of a “lone gunman”. Moreover, there must surely have been a good-sized pie involved to warrant the effort of buying (or, perhaps more likely given the idiosyncratic nature of the words) assembling code books: by which I mean there was enough benefit to be divided into at least three slices and still be worth everyone’s while.

What I’m trying to get at here is that, from the number of people involved, the tangledness of the code books, and the curious rigid codetext structure, that this seems to have been an amateur code system constructed to enable some kind of organized behaviour.

Betting springs obviously to mind here: and possibly horse-racing, given that “dobbin” and “onager” appear in the codewords. But there’s another possibility…

Numbers and policies?

With its Puritan historical backdrop, America has long had an ambivalent attitude towards both gambling and alcohol: the history of casinos, inter-state gambling, and even Prohibition all attest strongly to this.

By the 1880s, the kind of state or local lotteries that had flourished at the start of that same century had almost all been shut down, victims of corruption and scandals. The one that remained (the Louisiana Lottery) was arguably even more corrupt than the others, but remained afloat thanks to the number of politicians benefiting from it: in modern political argot, it was (for a while, at least) “too big to fail”.

What stepped into the place of the state lotteries were illegal local lotteries, better known as the “numbers game”, or the numbers racket. Initially, these were unofficial lotteries run from private residences: but later (after the 1920s, I believe), they began to instead use numbers printed in newspapers that were believed to be random (such as the last three digits of various economic indicators, such as the total amount of money taken at a given racetrack), because of – surprise, surprise – the same kinds of corruption and rigging that had plagued the early official state lotteries.

Though the numbers racket became known as the scourge of Harlem in the first half of the twentieth century (there’s a very good book on this, “Playing the Numbers: Gambling in Harlem between the Wars”), modern state lotteries and interstate sports betting all but killed it off, though a few numbers joints do still exist (“You’re too late to play!“).

Back in the second half of the 19th century, ‘policy shops’ (where the question “do you want to buy a policy?” drew a parallel between insurance and gambling) started to flourish, eventually becoming a central feature of the American urban landscape. With more and more state lotteries being shut down as the century progressed, numbers were arguably the face of small-stake betting: in terms of accessibility, they were the equivalent of scratch cards, available nearly everywhere.

For a long time, though, information was king: if you were organized enough to get access to the numbers before the policy shop did, you could (theoretically) beat the odds. Winning numbers were even smuggled out by carrier pigeon: yet policy shops (who liked to take bets right up until the last possible moment) were suspicious of “pigeon numbers”, and would often not pay out if they caught so much as a sniff of subterfuge. It’s not as if you could complain to the police, right?

At the same time, a whole hoodoo culture grew up around numbers, where superstitious players were sold incense sticks, bath crystals, and books linking elements in your dreams to numbers. First published in 1889, one well-known one was “Aunt Sally’s Policy Player’s Dream Book”:

This contained lists linking dream-items to suggestions of matching number sequences to back, with two numbers being a “saddle”, three numbers a “gig”, and four numbers a “horse”: on the book’s cover, Aunt Sally is shown holding up “the washerwoman’s gig” (i.e. 4.11.44). There’s much more about this on Cat Yronwode’s excellent Aunt Sally page.

Might it be that these two Silk Dress Cipher sheets are somehow numbers betting slips that have been encoded? Could it be that each line somehow encodes a name (say, the first two columns), the size of the bet, and a set of numbers to bet on? There were certainly illegal lotteries and policy shops in Baltimore, so this is far from impossible.

Right now, I don’t know: but I’d be very interested to know of any books that cover the history of “policy shops” in the 19th century. Perhaps the clues will turn out to be somewhere under The Baltimore Sun…

As I see it, there are four foundational tasks that need to be done to wrangle Voynichese into a properly usable form:

* Task #1: Transcribing Voynichese text into a reliable computer-readable raw transcription e.g. EVA qokeedy
* Task #2: Parsing the raw transcription to determine Voynichese’s fundamental units (its tokens) e.g. [qo][k][ee][dy]
* Task #3: Clustering the pages / folios into groups where the text shares distinct features e.g. Currier A vs Currier B
* Task #4: Normalizing the clusters e.g. how A tokens / patterns map to B tokens / patterns, etc

I plan to tackle these four areas in separate posts, to try to build up a substantive conversation on each topic in turn.

Takahashi’s EVA transcription

Rene Zandbergen points out that, of all the different “EVA” transcriptions that appear interleaved in the EVA interlinear file, “the only one that was really done in EVA was the one from Takeshi. He did not use the fully extended EVA, which was probably not yet available at that time. All other transcriptions have been translated from Currier, FSG etc to EVA.

This is very true, and is the main reason why Takeshi Takahashi’s transcription is the one most researchers tend to use. Yet aside from not using extended EVA, there are a fair few idiosyncratic things Takeshi did that reduce its reliability, e.g. as Torsten Timm points outTakahashi reads sometimes ikh where other transcriptions read ckh“.

So the first thing to note is that the EVA interlinear transcription file’s interlinearity arguably doesn’t actually help us much at all. In fact, until such time as multiple genuinely EVA transcriptions get put in there, its interlinearity is more of an archaeological historical burden than something that gives researchers any kind of noticeable statistical gain.

What this suggests to me is that, given the high quality of the scans we now have, we really should be able to collectively determine a single ‘omega’ stroke transcription: and even where any ambiguity remains (see below), we really ought to be able to capture that ambiguity within the EVA 2.0 transcriptions itself.

EVA, Voyn-101, and NEVA

The Voyn-101 transcription used a glyph-based Voynichese transcription alphabet derived by the late Glen Claston, who invested an enormous amount of his time to produce a far more all-encompassing transcription style than EVA did. GC was convinced that many (apparently incidental) differences in the ways letter shapes were put on the page might encipher different meanings or tokens in the plaintext, and so ought to be captured in a transcription.

So in many ways we already have a better transcription, even if it is one very much tied to the glyph-based frame of reference that GC was convinced Voynichese used (he firmly believed in Leonell Strong’s attempted decryption).

Yet some aspects of Voynichese writing slipped through the holes in GC’s otherwise finely-meshed net, e.g. the scribal flourishes on word-final EVA n shapes, a feature that I flagged in Curse back in 2006. And I would be unsurprised if the same were to hold true for word-final -ir shapes.

All the same, GC’s work on v101 could very well be a better starting point for EVA 2.0 than Takeshi’s EVA. Philip Neal writes: “if people are interested in collaborating on a next generation transcription scheme, I think v101/NEVA could fairly easily be transformed into a fully stroke-based transcription which could serve as the starting point.

EVA, spaces, and spatiality

For Philip Neal, one key aspect of Voynichese that EVA neglects is measurements of “the space above and below the characters – text above, blank space above etc.

To which Rene adds that “for every character (or stroke) its coordinates need to be recorded separately”, for the reason that “we have a lot of data to do ‘language’ statistics, but no numerical data to do ‘hand’ statistics. This would, however, be solved by […having] the locations of all symbols recorded plus, of course their sizes. Where possible also slant angles.

The issue of what constitutes a space (EVA .) or a half-space (EVA ,) has also not been properly defined. To get around this, Rene suggests that we should physically measure all spaces in our transcription and then use a software filter to transform that (perhaps relative to the size of the glyphs around it) into a space (or indeed half-space) as we think fit.

To which I’d point out that there are also many places where spaces and/or half-spaces seem suspect for other reasons. For example, it would not surprise me if spaces around many free-standing ‘or’ groups (such as the famous “space transposition” sequence “or or oro r”) are not actually spaces at all. So it could well be that there would be context-dependent space-recognition algorithms / filters that we might very well want to use.

Though this at first sounds like a great deal of work to be contemplating, Rene is undaunted. To make it work, he thinks that “[a] number of basics should be agreed, including the use of a consistent ‘coordinate system’. Again, there is a solution by Jason Davies [i.e.], but I think that it should be based on the latest series of scans at the Beinecke (they are much flatter). My proposal would be to base it on the pixel coordinates.

For me, even though a lot of this would be nice things to have (and I will be very interested to see Philip’s analysis of tall gallows, long-tailed characters and space between lines), the #1 frustration about EVA is still the inconsistencies and problems of the raw transcription itself.

Though it would be good to find a way of redesigning EVA 2.0 to take these into account, perhaps it would be better to find a way to stage delivery of these features (hopefully via OCR!), just so we don’t end up designing something so complicated that it never actually gets done. 🙁

EVA and Neal Keys

One interesting (if arguably somewhat disconcerting) feature of Voynichese was pointed out by Philip Neal some years ago. He noted that where Voynichese words end in a gallows character, they almost always appear on the top line of a page (sometimes the top line of a paragraph). Moreover, these had a strong preference for being single-leg gallows (EVA p and EVA f); and also for appearing in nearby pairs with a short, often anomalous-looking stretch of text between them. And they also tend to occur about 2/3rds of the way across the line in which they fall.

Rather than call these “top-line-preferring-single-leg-gallows-preferring-2/3rd-along-the-top-line-preferring-anomalous-text-fragments“, I called these “Neal Keys”. This term is something which other researchers (particularly linguists) ever since have taken objection with, because it superficially sounds as though it is presupposing that this is a cryptographic mechanism. From my point of view, those same researchers didn’t object too loudly when cryptologist Prescott Currier called his Voynichese text clusters “languages”: so perhaps on balance we’re even, OK?

I only mention this because I think that EVA 2.0 ought to include a way of flagging likely Neal Keys, so that researchers can filter them in or out when they carry out their analyses.

EVA and ambiguity

As I discussed previously, one problem with EVA is that it doesn’t admit to any uncertainty: by which I mean that once a Voynichese word has been transcribed into EVA, it is (almost always) then assumed to be 100% correct by all the people and programmes that subsequently read it. Yet we now have good enough scans to be able to tell that this is simply not true, insofar as there are a good number of words that do not conform to EVA’s model for Voynichese text, and for which just about any transcription attempt will probably be unsatisfactory.

For example, the word at the start of the fourth line on f2r:

Here, the first part could possibly be “sh” or “sho”, while the second part could possibly be “aiidy” or “aiily”: in both cases, however, any transcriber attempting to reduce it to EVA would be far from certain.

Currently, the most honest way to transcribe this in EVA would be “sh*,aii*y” (where ‘*’ indicates “don’t know / illegible”). But this is an option that isn’t taken as often as it should.

I suspect that in cases like this, EVA should be extended to try to capture the uncertainty. One possible way would be to include a percentage value that an alternate reading is correct. In this example, the EVA transcription could be “sh!{40%=o},aiid{40%=*}y”, where “!{40%=o}” would mean “the most likely reading is that there is no character there (i.e. ‘!’), but there is a 40% chance that the character should be ‘o'”.

For those cases where two or more EVA characters are involved (e.g. where there is ambiguity between EVA ch and EVA ee), the EVA string would instead look like “ee{30%=ch}”. And on those occasions where there is a choice between a single letter and a letter pair, this could be transcribed as “!e{30%=ch}”.

For me, the point about transcribing with ambiguity is that it allows people doing modelling experiments to filter out words that are ambiguous (i.e. by including a [discard words containing any ambiguous glyphs] check box). Whatever’s going on in those words, it would almost always be better to ignore them rather than to include them.

EVA and Metadata

Rene points out that the metadata “were added to the interlinear file, but this is indeed independent from EVA. It is part of the file format, and could equally be used in files using Currier, v101 etc.” So we shouldn’t confuse the usefulness of EVA with its metadata.

In many ways, though, what we would really like to have in the EVA metadata is some really definitive clustering information: though the pages currently have A and B, there are (without any real doubt) numerous more finely-grained clusters, that have yet to be determined in a completely rigorous and transparent (open-sourced) way. However, that is Task #3, which I hope to return to shortly.

In some ways, the kind of useful clustering I’m describing here is a kind of high-level “final transcription” feature, i.e. of how the transcription might well look much further down the line. So perhaps any talk of transcription

How to deliver EVA 2.0?

Rene Zandbergen is in no doubt that EVA 2.0 should not be in an interlinear file, but in a shared online database. There is indeed a lot to be said for having a cloud database containing a definitive transcription that we all share, extend, mutually review, and write programmes to access (say, via RESTful commands).

It would be particularly good if the accessors to it included a large number of basic filtering options: by page, folio, quire, recto/verso, Currier language, [not] first words, [not] last words, [not] first lines, [not] labels, [not] key-like texts, [not] Neal Keys, regexps, and so forth – a bit like on steroids. 🙂

It would also be sensible if this included open-source (and peer-reviewed) code for calculating statistics – raw instance counts, post-parse statistics, per-section percentages, 1st and 2nd order entropy calculations, etc.

Many of these I built into my JavaScript Voynichese state machine from 2003: there, I wrote a simple script to convert the interlinear file into JavaScript (developers now would typically use JSON or I-JSON).

However, this brings into play the questions of boundaries (how far should this database go?), collaboration (who should make this database), methodology (what language or platform should it use?), and also of resources (who should pay for it?).

One of the strongest reasons for EVA’s success was its simplicity: and given the long (and complex) shopping list we appear to have, it’s very hard to see how EVA 2.0 will be able to compete with that. But perhaps we collectively have no choice now.

In the Voynich research world, several transcriptions of the Voynich Manuscript’s baffling text have been made. Arguably the most influential of these is EVA: this originally stood for “European Voynich Alphabet”, but was later de-Europeanized into “Extensible Voynich Alphabet”.

The Good Things About EVA

EVA has two key aspects that make it particularly well-adapted to Voynich research. Firstly, the vast majority of Voynichese words transcribed into EVA are pronouncable (e.g. daiin, qochedy, chodain, etc): this makes them easy to remember and to work with. Secondly, it is a stroke-based transcription: even though there are countless ways in which the inidvidual strokes could possibly be joined together into glyphs (e.g. ch, ee, ii, iin) or parsed into possible tokens (e.g. qo, ol, dy), EVA does not try to make that distinction – it is “parse-neutral”.

Thanks to these two aspects, EVA has become the central means by which Voynich researchers trying to understand its textual mysteries converse. In those terms, it is a hugely successful design.

The Not-So-Good Things About EVA

In retrospect, some features of EVA’s design are quite clunky:
* Using ‘s’ to code both for the freestanding ‘s’-shaped glyph and for the left-hand half of ‘sh’
* Having two ways of coding ligatures (either with round brackets or with upper-case letters)
* Having so many extended characters, many of which are for shapes that appear exactly once

There are other EVA design limitations that prevent various types of stroke from being captured:
* Having only limited ways of encoding the various ‘sh’ “plumes” (this particularly annoyed Glen Claston)
* Having no way of encoding the various ‘s’ flourishes (this also annoyed Glen)
* Having no way of encoding various different ‘-v’ flourishes (this continues to annoy me)

You also run into various annoying inconsistences when you try to use the interlinear transcription:
* Some transcribers use extended characters for weirdoes, while others use no extended characters at all
* Directional tags such as R (radial) and C (circular) aren’t always used consistently
* Currier language (A / B) isn’t recorded for all pages
* Not all transcribers use the ‘,’ (half-space) character
* What one transcriber considers a space or half-space, another leaves out completely

These issues have led some researchers to either make their own transcriptions (such as Glen Claston’s v101 transcription), or to propose modifications to EVA (such as Philip Neal’s little-known ‘NEVA’, which is a kind of hybrid, diacriticalised EVA, mapped backwards from Glen Claston’s transcription).

However, there are arguably even bigger problems to contend with.

The Problem With EVA

The first big problem with EVA is that in lots of cases, Voynichese just doesn’t want to play ball with EVA’s nice neat transcription model. If we look at the following word (it’s right at the start of the fourth line on f2r), you should immediately see the problem:

The various EVA transcribers tried gamely to encode this (they tried “chaindy”, “*aiidy”, and “shaiidy”), but the only thing you can be certain of is that they’re probably all wrong. Because of the number of difficult cases such as this, EVA should perhaps have included a mechanism to let you flag an entire word as unreliable, so that people trying to draw inferences from EVA could filter it out before it messes up their stats.

(There’s a good chance that this particular word was miscopied or emended: you’d need to do a proper codicological analysis to figure out what was going on here, which is a complex and difficult activity that’s not high up on anyone’s list of things to do.)

The second big problem with EVA is that of low quality. This is (I believe) because almost all of the EVA transcriptions were done from the Beinecke’s ancient (read: horrible, nasty, monochrome) CopyFlo printouts, i.e. long before the Beinecke released even the first digital image scan of the Voynich Manuscript’s pages. Though many CopyFlo pages are nice and clean, there are still plenty of places where you can’t easily tell ‘o’ from ‘a’, ‘o’ from ‘y’, ‘ee’ from ‘ch’, ‘r’ from ‘s’, ‘q’ from ‘l’, or even ‘ch’ from ‘sh’.

And so there are often wide discrepancies between the various transcriptions. For example, looking at the second line of page f24r:

…this was transcribed as:

qotaiin.char.odai!n.okaiikhal.oky-{plant} --[Takahashi]
qotaiin.eear.odaiin.okai*!!al.oky-{plant} --[Currier, updated by Voynich mailing list members]
qotaiin.char.odai!n.okaickhal.oky-{plant} --[First Study Group]

In this specific instance, the Currier transcription is clearly the least accurate of the three: and even though the First Study Group transcription seems closer than Takeshi Takahashi’s transcription here, the latter is frequently more reliable elsewhere.

The third big problem with EVA is that Voynich researchers (typically newer ones) often treat it as if it is final (it isn’t); or as if it is a perfect representation of Voynichese (it isn’t).

The EVA transcription is often unable to reflect what is on the page, and even though the transcribers have done their best to map between the two as best they can, in many instances there is no answer that is definitively correct.

The fourth big problem with EVA is that it is in need of an overhaul, because there is a huge appetite for running statistical experiments on a transcription, and the way it has ended up is often not a good fit for that.

It might be better now to produce not an interlinear EVA transcription (i.e. with different people’s transcriptions interleaved), but a single collective transcription BUT where words or letters that don’t quite fit the EVA paradigm are also tagged as ambiguous (e.g. places where the glyph has ended up in limbo halfway betwen ‘a’ and ‘o’).

What Is The Point Of EVA?

It seems to me that the biggest problem of all is this: that almost everyone has forgotten that the whole point of EVA wasn’t to close down discussion about transcription, but rather to enable people to work collaboratively even though just about every Voynich researcher has a different idea about how the individual shapes should be grouped and interpreted.

Somewhere along the line, people have stopped caring about the unresolved issue of how to parse Voynichese (e.g. to determine whether ‘ee’ is one letter or two), and just got on with doing experiments using EVA but without understanding its limitations and/or scope.

EVA was socially constructive, in that it allowed people with wildly different opinions about how Voynichese works to discuss things with each other in a shared language. However, it also inadvertantly helped promote an inclusive accommodation whereby people stopped thinking about trying to resolve difficult issues (such as working out the correct way to parse the transcription).

But until we can start find out a way to resolve such utterly foundational issues, experiments on the EVA transcription will continue to give misleading and confounded results. The big paradox is therefore that while the EVA transcription has helped people discuss Voynichese, it hasn’t yet managed to help people advance knowledge about how Voynichese actually works beyond a very superficial level. *sigh*

For far too long, Voynich researchers have (in my opinion) tried to use statistical analysis as a thousand-ton wrecking ball, i.e. to knock down the whole Voynich edifice in a single giant swing. Find the perfect statistical experiment, runs the train of thought, and all Voynichese’s skittles will clatter down. Strrrrike!

But… even a tiny amount of reflection should be enough to show that this isn’t going to work: the intricacies and contingencies of Voynichese shout out loud that there will be no single key to unlock this door. Right now, the tests that get run give results that are – at best – like peering through multiple layers of net curtains. We do see vague silhouettes, but nothing genuinely useful appears.

Whether you think Voynichese is a language, a cipher system, or even a generated text doesn’t really matter. We all face the same initial problem: how to make Voynichese tractable, by which I mean how to flatten it (i.e. regularize it) to the point where the kind of tests people run do stand a good chance of returning results that are genuinely revealing.

A staging point model

How instead, then, should we approach Voynichese?

The answer is perhaps embarrassingly obvious and straightforward: we should collectively design and implement statistical experiments that help us move towards a series of staging posts.

Each of the models on the right (parsing model, clustering model, and inter-cluster maps) should be driven by clear-headed statistical analysis, and would help us iterate towards the staging points on the left (parsed transcription, clustered parsed transcription, final transcription).

What I’m specifically asserting here is that researchers who perform statistical experiments on the raw stroke transcription in the mistaken belief that this is as good as a final transcription are simply wasting their time: there are too many confounding curtains in the way to ever see clearly.

The Curse, statistically

A decade ago, I first talked about “The Curse of the Voynich”: my book’s title was a way of expressing the idea that there was something about the way the Voynich Manuscript was constructed that makes fools of people who try to solve it.

Interestingly, it might well be that the diagram above explains what the Curse actually is: that all the while people treat the raw (unparsed, unclustered, unnormalized) transcription as if it were the final (parsed, clustered, normalized) transcription, their statistical experiments will continue to be confounded in multiple ways, and will show them nothing useful.

I’ve just had a particularly interesting email exchange with Paul Relkin concerning the Feynman Challenge Ciphers, which he has generously allowed me to share here. The context is that the first Feynman Challenge cipher’s plaintext was from the very start of Geoffrey Chaucer’s Canterbury Tales, i.e. the first twelve lines of the General Prologue:


Paul writes:

The Prologue

I’d like to share with you a possible clue I’ve discovered to the sources of the 2nd and 3rd Feynman Ciphers. My findings relate to the identification of a specific published transcription of the Canterbury Tales that is the probable source of the 1st Feynman Cipher.

As you are probably aware, the Canterbury Tales have been transcribed and reprinted innumerable times. Among the many different published editions of the Canterbury Tales, there are several idiosyncratic spellings associated with particular transcriptions. Although individual lines are spelled the same way in many different editions, I found that the 12 lines of the Feynman Cipher taken together are unique enough to match only one published transcription, like a “word fingerprint”.

To find the edition that the Feynman Cipher is based on, I extensively searched for editions of the General Prologue that were published before or during World War II and compared the word spellings to the Feynman Cipher.

First, I discovered what may be a typo in the 1st Feynman Cipher. The word “brefth” does not appear in any published edition of the General Prologue I have been able to identify. The most likely correct spelling is “breeth”.

Second, I found that the only version of the General Prologue that matches the Feynman Cipher is Fred Norris Robinson’s 1st edition of Chaucer’s Complete Works. In the introduction to his book, Robinson actually discusses several of the uniquely spelled words that later found their way into the 1st Feynman Cipher and explains why he rejected the popular spellings and chose less common ones.

Possible Sources

Having identified Robinson’s transcription as the probable source of the 1st Feynman Cipher, I discovered that there are only a few different editions of this transcription that were published between 1933 and 1938 that could have been used by the author of the Feynman Ciphers:

In 1933, Houghton Mifflin published this book in at least three editions:

The Complete Works of Geoffrey Chaucer (black):

The Complete Works of Geoffrey Chaucer, Student’s Cambridge Edition (red):

The Poetical Works of Chaucer, Cambridge Edition (white):

In 1936, Houghton Mifflin published small books containing parts of Robinson’s Canterbury Tales with an introduction written by Max John Herzberg. The title of the book that contains the quote used in the cipher is “The Prologue, the Knight’s Tale, and the Nun’s Priest’s Tale”:

In 1938, Houghton Mifflin included Robinson’s Canterbury Tales in a two volume collection of British poetry by Paul Robert Lieder called “British Poetry and Prose” (Volume 1):

Interestingly, Robinson’s 2nd edition of Chaucer’s Complete Works in 1957 no longer matches the spellings in the cipher!

It’s specifically here where I think we may find clues to the 2nd and 3rd ciphers. It seems plausible to me that “British Poetry and Prose” contains other literary works that were the basis for the 2nd and 3rd Feynman Ciphers. For example, several of its poems have 6 letter words that repeat twice, consistent with “CJUMVRCJUMVR” in the 2nd Feynman Cipher.

Robinson’s 1933 book of Chaucer’s Complete Works could also be the source of the 2nd and 3rd ciphers. The 1933 book is part of a series of books called “The Cambridge Poets” and the 1936 book is part of a series called “The Riverside Literature Series”. The other books in the series are also potentially worth looking at.

Los Alamos?

My research suggests that several copies of these books have the original owner’s name and other notes written in them. If we were able to locate the copy that was used at Los Alamos, it might reveal the name of the scientist who created the ciphers. There may be other writings within it that would give further clues about the ciphers.

I discovered that the Mesa Public Library in Los Alamos has a copy of Robinson’s 1933 book. The Mesa Public Library originated during World War II in the Big House where Feynman lived, so I wondered whether the library book could be the copy that was used to create the cipher.

So, I recently arranged to borrow that book through interlibrary loan. Since I live on the East Coast, I had to try 5 different libraries before I found one that would let me request that particular book. It then took two tries because they accidentally requested the book from the Mesa Public Library in Arizona instead of the one in New Mexico. I finally received the book I requested. Unfortunately, the book plate indicates that it was donated to the library in the 1970s. This makes it unlikely (albeit not impossible) that this was the specific copy used in the period around World War II to create the 1st Feynman Cipher.

I hope you find this information interesting and that it brings us a step closer to solving the 2nd and 3rd Feynman Ciphers.

Chaucer and Cryptography?

(((NickP: I responded here, pointing out:)))

Incidentally, there are two interesting links between Geoffrey Chaucer and cryptography. The first (which you may well have heard of) is that he included six blocks of ciphertext in his Treatise on the “Equatorie” (basically a kind of astrolabe). But the second is that a very major work on Chaucer (finally published in 1940) was written by John Matthews Manly and Edith Rickert, both well-known code-breakers. (I’ve covered them a few times on CM, mainly because of Manly’s links to the Voynich Manuscript.)

However, Rickert died in 1938, Manly died in 1940 and Los Alamos only really started in 1943, so we can rule out a direct transmission from either of them to Feynman. All the same, I do consider it entirely possible that one/both of them was/were the ultimate source of the three cryptograms. Just so you know!

(((To which Paul replied:)))

Concerning your excellent point about Rickert and Manly, there was another colorful link between a Chaucer scholar and Los Alamos that I found while I was researching editions of the Canterbury Tales. John Strong Perry Tatlock was a famous Chaucer expert who transcribed Chaucer’s Complete Works. His daughter, Jean Frances Tatlock, had a romantic relationship with J. Robert Oppenheimer between 1936 and 1939. They continued to have an affair during Oppenheimer’s marriage. Their relationship was used as evidence against Oppenheimer during his security clearance hearings because Tatlock was a member of the Communist Party. As you know, Oppenheimer and Feynman had more than a passing acquaintance – as for Tatlock and Feynman, who knows?

Just a short note to say that I’ve today decided to stop selling physical copies of “The Curse of the Voynich”. I first published it at the end of 2006 (the front page says “v1.0: Emma Vine (Broceto)“, if you want to try decrypting that), and it’s now time for me to leave it to the book collectors and move on. 🙂

Thanks very much to everyone reading this who bought a copy along the way – this helped recoup me some of the money I lost during the six months I worked part-time while I did the research for it. And for those who bought their copy direct from Compelling Press, I really hope you enjoyed your anagrammatic dedication – finding nice anagrams of people’s names was always something I enjoyed doing.

Incidentally, second-hand copies of “Curse” are on sale through, though at prices ranging from £47 to £2500 (!): I expect the lowest price will rise to around £200 before very long, so anyone here who already has a copy is arguably now a little bit better off. Which is nice (if you’re an accountant). 🙂

Finally: for anyone who would like a copy of “Curse” in the future, please note that I plan to make an ebook version available before long (hopefully later this year). I’ll do my best, but don’t hold your breath waiting for it in the ultra-short term, because publication rights for pictures and quotations always take longer to clear than you’d like. *sigh*.

What I have long tried to do with this blog is to genuinely advance our collective knowledge about unbroken historical ciphers, not by speculating loosely or wildly (as seems to be the norm these days) but instead by trying to reason under conditions of uncertainty. That is: I try to use each post as an opportunity to think logically about multiple types of historical evidence that often coincide or overlap yet are individually hard to work with – ciphers, cryptograms, drawings, treasure maps, stories, legends, claims, beliefs, mysteries.

The world of cipher mysteries, then, is a world both of uncertain evidence and also of uncertain history built on top of that uncertain evidence – perpetually thin ice to be skating on, to be sure.

A skills void?

It is entirely true that all historical evidence is inherently uncertain: people lie, groups have agendas, listeners misunderstand, language misleads, copyists misread, propagandists appropriate, historians overselect, forgers fake, etc. All the same, seeing past/through the textual uncertainties these kinds of behaviours can leave embedded in evidence is the bread and butter of modern historians, who are now trained to be adept both in close reading and critical thinking.

However, what I am arguing here is that though History-as-text – i.e. history viewed as primarily an exercise in textual literature analysis – managed to win the historical high ground, it did so at the cost of supplanting almost all non-textual historical disciplines. To my eyes, the slow grinding deaths of codicology, palaeography and even dear old iconography (now more visible in Dan Brown film adaptations than in bibliographies) along with what I think is the increasing marginalization of Art History far from the historical mainstream have collectively left a huge gap at the heart of the subject.

This isn’t merely a focus void, it’s also centrally a skills void – the main missing skill being the ability to reason under conditions where the evidence’s textual dimension is missing or sharply limited.

In short, I would argue that because historians are now trained to deal primarily with textual uncertainties, the ability to reason effectively with other less compliant types of evidence is a skill few now seem to have to any significant degree. In my opinion, this aspect of text-centrism is a key structural weakness of history as now taught.

In my experience, almost nothing exposes this weakness more than the writing done on the subject of historical cipher mysteries. There it is absolutely the norm to see otherwise clever people make fools of themselves, and moreover in thousands of different ways: surely in few other subject domains has so much ink have been spilled to so little effect. In Rene Zandbergen’s opinion, probably the most difficult thing about Voynich research is avoiding big mistakes: sadly, few seem able to achieve this.

“The Journal of Uncertain History”

Yet a key problem I face is that when it comes to presenting or publishing, the kind of fascinating historical mysteries I research are plainly a bad fit for the current academic landscape. This is because what I’m trying to develop and exercise there is a kind of multi-disciplinary / cross-disciplinary analytical historical skill (specifically: historical reasoning under uncertainty) that has quite different aims and success criteria from mainstream historical reasoning.

On the one hand, this “Uncertain History” is very much like Intellectual History, in that it is a meta-historical approach that freely crosses domain boundaries while relying heavily on the careful application of logic in order to make progress. And yet I would argue that Intellectual History as currently practised is heavily reliant on the universality of text and classical logic to build its chains of reasoning. In that sense, Intellectual History is a close cousin to the text-walled world of MBA courses, where all statements in case studies are deemed to be both true and given in good faith.

By way of contrast, Uncertain History turns its face primarily to those historical conundrums and mysteries where text falls short, where good faith can very often be lacking, and where strict Aristotelian logic can prove more of a hindrance than a help (here I’m thinking specifically about the Law of the Excluded Middle).

And so I propose launching a new open-source historical journal (Creative Commons BY-NC Licence), with the provisional name of “The Journal of Uncertain History“, and with the aim of providing a home for Uncertain History research of all types.

To be considered for the JoUH, papers should (also provisionally) be tackling research areas where:

* the historical evidence itself is problematic and/or uncertain;
* there is a problematic interplay between the types of evidence;
* to make genuine progress, non-trivial reasoning is required, not just for thinking but also for explanation;
* historical speculations made within the paper are both proposed and tested; and
* future tests (preferably empirical) and/or research leads are proposed.

I welcome all your comments, thoughts, and suggestions for possible submissions, authors, collaborators and/or editors; and especially reasons why existing journals X, Y and Z would all be better homes for this kind of research than the JoUH. 🙂

Back in 2006, I reasoned (in The Curse of the Voynich) that if the nine-rosette page’s circular city with a castle at the top…

…represented Milan (one of only three cities renowned for their circular shape), then the presence of swallowtail merlons on the drawing implied it must have been drawn after 1450, when the rebuilding of the old Porta Giovia castle (that was wrecked during the Ambrosian Republic) by Francesco Sforza as [what is now known as] the Castello Sforzesco began.

Ten Years Later, A Challenge

However, Mark Knowles recently challenged me on this: how was I so sure that the older castle on the site didn’t also have swallowtail merlons?

While writing Curse, for the history of Milan I mainly relied on the collection of essays and drawings in Vergilio Vercelloni’s excellent “Atlante Storico di Milano, Città di Lombardia”, such as these two pictures from Milano fantastica, in “Historia Evangelica et actos apostolorum cum alijs illorum temporum eventibus cum figuris crebioribus delineatis”, circa 1380:

…and this old favourite (which Boucheron notes [p.199] is a copy probably made between 1456 and 1472 of an original made in the 1420s)…

On the surface, it seemed from these as though I had done enough. But coming back to it, might I have been too hasty? I decided to fetch down my copies of Evelyn Welch’s “Art and Authority in Renaissance Milan” and Patrick Boucheron’s “Le Pouvoir de Bâtir” from the book overflow in the attic and have another look…

Revisiting Milan’s Merlons

What did I find? Well: firstly, tucked away in a corner of a drawing by Galvano Fiamma (in the 1330s) of a view of Milan (reproduced as Plate IIa at the back of Boucheron’s book), the city walls appear to have some swallowtail merlons (look just inside the two outermost towers and you should see them):

And in a corner of a drawing by Anovelo da Imbonate depicting and celebrating the 1395 investiture of Gian Galeazzo Visconti (reproduced in Welch p.24), I noticed a tiny detail that I hadn’t picked up on before… yet more swallowtail merlons:

Then, when I looked at other miniatures by the same Anovelo da Imbonate, I found two other (admittedly stylized) depictions of Milan by him that also unmistakeably have swallowtail merlons:

So it would seem that Milan’s city walls may well have had swallowtail merlons prior to 1450. The problem is that the city walls aren’t the same as the Porta Giovia castle walls (built from 1358, according to Corio): and I don’t think we know enough to say whether or not the castle itself had swallowtail merlons. It’s debatable whether the drawing of the 1395 investiture (which took place in the Porta Giovia castle) depicts the castle itself having swallowtail merlons: I just don’t know.

But the short version of the long answer is that because the Porta Giovia castle was only built from 1358-1372 (or thereabouts), we can’t rely on texts written before then (such as Galvano Fiamma’s). And there seems quite good reason to suspect (the Massajo drawing notwithstanding) that the Porta Giovia castle may well have had swallowtail merlons when it was used for the Visconti investiture in 1395. But I don’t know for certain, sorry. 🙁

There are texts that might give us an answer: for example, the (1437) “De Laudibus Mediolanensium urbis panegyricus” by Pier Candido Decembrio (mentioned in Boucheron p.74), or Bernardino Corio’s “Storia di Milano”. There are plenty of documents Boucheron cites in footnotes (pp.202-205), including “Lavori ai castelli di Bellinzona nel periodo visconteo”, Bolletino della Svizzera italiana, XXV, 1903, pp.101-104 (which I’ll leave for another day). But it’s obviously quite a lot of work. 🙁

Finally, I should perhaps add that a few details by Anovelo da Imbonate have an intriguingly Voynichian feel:

Though there were plenty of other miniature artists active in the Visconti court in Milan in the decades up to 1447, parallels between their art and the Voynich Manuscript’s drawings haven’t been explored much to date. Perhaps this is a deficiency in our collective Art Historical view that should be rectified. 🙂

Whether we like it or not, history as practised nowadays is a tower built upon textuality, upon the implicit evidentiality striped within and through texts. Even archaeology (of all but the obscenely distant past) and Art History rely heavily on texts for their reconstructions.

Alternative, explicitly visual approaches to history have lost the battle to control the locus of meaning. The mid-twentieth century Warburg/Saxl/Panofsky dream that highly evolved iconography/iconology might be able to surgically extract the inner semantic life of symbols from their drab syntatical carapaces now seems hopelessly over-optimistic, fit only for the Hollywood cartoons of Dan Brown novels. Sorry, but Text won.

What, then, are contemporary historians to make of the Voynich Manuscript, a barque adrift in a wine-dark sea of textlessness? In VoynichLand, we have letters, letters everywhere, and not a jot for them to read: and without close reading’s robotic exoskeleton to work with, where could such a text-centric generation of scholars begin?

Well, given that the Voynich Manuscript’s text-like writing has so failed yielded nothing of obvious substance to linguists or cryptologists (apart from long lists of things that they are sure it is not), historians are only comfortably left with a single door leading to the disco floor…

“Step #1. Start with the pictures.”

Yes, they could indeed start with the pictures: the Voynich’s beguiling, misleading, and crisply non-religious images. These contain plants that are real, distorted, imaginary, and/or impossible; strange circular diagrams; oddly-posed nymphs arranged in tubes and pools; and curious map-like diagrams. They famously lead everywhere and nowhere simultaneously, like a bad mirror-room fight-scene in 1960s Avengers TV episodes.

Without the comforting crutch of referentiality to lean on, we can’t tell whether a given picture happens to parallel one of the plants in Ulisse Aldrovandi’s famous (so-called) “alchemical herbals” (which unfortunately seem to be neither alchemical nor particularly herbal); or whether we’re just imagining that it echoes a specific plant in this week’s interesting Arabic book of wonders; or whether its roots were drawn from a dried sample but its body was imagined; or whether a different one of the remaining three hundred and eighty post-rationalizations that have been made for that page happens to hold true.

But on the bright side, it’s not as if we’re talking about a set of drawings that has previously made fools of just about everyone who has tried to form a sensible opinion about them, right? [*hollow laugh*]

So, “start with the pictures” it is. But what should we do then? Again, there seems little choice:

“Step #2. Find a telling detail.”

In my opinion, here’s where it all start to go wrong: where the road leads only to a cliff-edge, and one that has a sizeable drop below it into the sea.

The elephant-in-the-room question here is this: if looking for telling details is such a good idea, why is it that more then a century’s worth of looking for telling details has revealed practically nothing?

Is it because everyone who has ever looked at the Voynich Manuscript has been stupid, or inexperienced, or foolish, or delusional, or crazy, or marginal, or naive? Because that’s essentially what would need to be true for your own contribution to bring a new bottle to the party, if all you’re going to do yourself is look for telling details.

The thing that almost nobody seems to grasp is that we collectively have already applied an extraordinary amount of eyeballs at this issue.

Even though the Voynich’s imagery has been seen and ‘closely read’ for over a century by all manner of people, to date this has – in terms of finding the single telling detail that can place even part of it within an illustrative or semantic tradition – achieved nothing, zilch, nada.

Incidentally, this leads (I think) to one of only two basic constructional models: (a) the drawings in the Voynich Manuscript are from a self-contained culture whose internal frame of reference sits quite apart from anything we’re used to looking at [a suggestion which I’m certain the palaeography refutes completely]; or (b) the process of making the drawings for the Voynich Manuscript somehow consciously stripped out their referentiality.

But I’m not imagining for a moment that what I’m pointing out will stop anyone else from reinventing this same square wheel: all I’m saying is that this is how people approach the Voynich Manuscript, and why they then get themselves into a one-way tangle.

“Step #3. Draw a big conclusion.”

Finally, this is the point in the chain of the argument where the cart rolls properly over the cliff: though it’s a long way down, at least gravity’s accelerative force means anybody in it won’t have very long to wait before the sea comes up to meet them (relatively speaking).

How is it that anyone can comfortably draw a step #3 macro-conclusion from the itty-bitty (and horrendously uncertain) detail they latched onto in step #2? As proofs go, this step is completely contingent on at least three different things:
(a) on perfect identification of the detail itself,
(b) on perfect correlation with essentially the same thing but in an external tradition, and
(c) on the logical presumption that this is necessarily the only feasible explanation for the correlation

Each of these three would be extremely difficult to prove on its own, never mind when all three are required to be true at the same time for their sum to be true.

In my experience, when people put forward a Voynich manuscript macro-conclusion based on local correlation with some micro-detail they have noticed, they almost always haven’t noticed how weakly supported their overall argument is. Not only that, but why is it – given the image-rich source their external tradition normally is – that they can typically only point to a single image in it that supports their claimed correlation? That is fairly bankrupt, intellectually speaking.

How can we fix this issue?

This is a really hard problem. Art History tends to furnish historians with the illusion that they can use its conceptual tricks and technical ‘flow’ to tackle the Voynich Manuscript one single isolated detail at a time, but this isn’t really true in any useful sense.

A picture is a connected constellation of techniques, formed not only of ways of expressing things, but also of ways of seeing things. And so it’s a mystery why there should be such an otherness to the Voynich Manuscript’s drawings that deconstructing any part of it leaves us with next to nothing in our hands.

Part of this problem is easy to spot, insofar as there are plenty of places where we still can’t tell content from decoration from elaboration from emendation. Even a cursory look at pages such as the nine-rosette page or f116v should elicit the conclusion that they are made up of multiple layers, i.e. multiple codicological contributions.

For me, until someone uses tricks such as DNA analysis and Raman imaging to properly analyze the manuscript’s codicological layers, internal construction, and/or the original bifolio order of each of the sections, too many people will continue trying to read not “the unreadable”, but “the not-yet readable”: all of which will continue to lead to all manner of foolish reasoning and conclusions, as it has done for many decades.

I really want you understand that this isn’t because people are inherently foolish: rather, it’s because they almost all want to kid themselves that they can draw a solid macro-conclusion from an isolated and uncertain micro-similarity. And all the while that this continues to be the collective research norm, I have little doubt that we’re going to get nowhere.

Alexandra Marraccini’s presentation

You can see the slides and the draft article accompanying Alexandra Marracini’s recent talk here (courtesy of

The core of Marraccini’s argument seems to reduce to this: that if one or more of the circular castle roundels in the Voynich Manuscript’s nine-rosette foldout is in fact the same flattened city that appears in BL Sloane MS 4016 f.8v and/or Vat.Chig. F.VII 158 f.12r and/or BNF Lat 6823 f.13r (the first two of which also have a little dragon in one herbal root), then we might be able to place the Voynich Manuscript in one branch of the Tractatus de Herbis tradition (all of which derive from Firenze Biblioteca dipartemental e di Botanica MS 106).

Even though this is arguably a reasonable starting point for future investigation, I’m not yet seeing a lot of methodological ‘air’ between what she’s doing and the mass of detail-driven Voynich single-image theories Marraccini would doubtless wish to distance herself from. The structural weakness of their arguments are still – to a very large degree – her argument’s weakness too.

Going forward, this amounts to a theoretical lacuna which I think she might do well to address: that there is no obvious historical / analytical methodology to apply here that satisfactorily bridges the gap between micro-similarities and macro-conclusion in the absence of accompanying texts. OK, pointing to an absence is perhaps a bit more of a problematique than most historians these days are comfortable with, but I’m only the messenger here, sorry.

Anyway, there’s a nice transcription of the Q&A session she gave after her presentation (courtesy of VViews) here, which I’m sure many Voynich researchers will find interesting.

Oddly, though, the questions from an audience Voynichero with my 2006 book “The Curse of the Voynich” in mind were almost exactly the opposite of what I would myself have asked (had I been there). The single most important question is: why is your argument structurally any better than all the other similar arguments that have been put forward?

So, what is missing here?

The answer to this certainly isn’t working hypotheses about the Voynich Manuscript, because there’s no obvious shortage of those. Even the suggestion that there might be some stemmatic relation (however vague and ill-defined) between the drawings in Voynich Manuscript and BL Sloane MS 4016 has been floating around for some years.

Instead, what I think is missing is a whole set of evidential basics: for example, physical data and associated reasoning that tell us without almost no doubt which paints were original (answer: not many of them) and which were added later; or (perhaps more importantly) what the original bifolio nesting order was.

With these to work with, we could reject many, many incorrect hypotheses: and we might – with just a little bit of luck – possibly be able to use one or two as fixed points to pivot the whole discourse round, like an Archimedean Lever.

The alternative, sadly, is a long sequence of more badly-structured arguments, Groundhog Day-stylee. Even if my ice-carving technique has got stupendously good, it would be nice to have a change, right?

Here’s a thing that struck me the other day about the Anthon Transcript that I thought I ought to mention.

The way that the story has been passed down to us makes it far from easy to reconcile the “Caractors” page…

…with the “singular scrawl” shown to Professor Charles Anthon in 1828:

“It consisted of all kinds of crooked characters disposed in columns, and had evidently been prepared by some person who had before him at the time a book containing various alphabets. Greek and Hebrew letters, crosses and flourishes, Roman letters inverted or placed sideways, were arranged in perpendicular columns, and the whole ended in a rude delineation of a circle divided into various compartments, decked with various strange marks, and evidently copied after the Mexican Calendar given by Humboldt, but copied in such a way as not to betray the source whence it was derived.”

It would seem as though the first page was arranged in horizontal rows, while the second page was arranged in vertical columns and with a compartmentalized circle added after it. It sounds as though we are talking about two quite different things, and so shouldn’t even attempt to reconcile them as one. Yet in the decade up to his death in 1888, David Whitmer repeatedly asserted that this first page was indeed the very same “original paper” that Martin Harris had taken to Charles Anthon.

At this point, any passing Intellectual Historian might gently suggest that all these statements may well have been said in good faith: and that what is actually stopping us seeing them all as descriptions of the same thing is not the evidence itself, but our stubbornly persistent misreading of what is in front of our faces.

Can we do better?

The case of the ‘H’

I suspect that we can: and the giveaway that may well help to point us in the direction of what happened is the capital ‘H’ shape that occurs at least eight times:

How on earth did the person copying this down not notice that this was nothing more than an ornate ‘H’ shape? Though I’ve long wondered about this, reasonable answers have to date always eluded me. But what I noticed here is that perhaps the actual explanation is painfully simple: that the person who originally wrote these down copied the shapes as if they were written in columns, i.e. without seeing them as ‘H’ shapes at all.

The photograph in Clay County Museum directly supports this idea, because the writing on the other side of the fold (“The Book of Generation Adam”) is written sideways:

The two-button mouse

Even though most of the Caractors are evenly inked, the strong downward strokes of one of the three “two-button mouse” shapes also seems to indicate to my eyes that the letters were written ninety degrees rotated from what we see now (though I’d appreciate other people’s palaeographical insights on this particular issue):

I’d have thought the suggestion that these letters were originally written in columns rather than rows would be a palaeographical hypothesis that could be tested out and resolved one way or the other.

Reconstructing the sequence

If the above is basically right, it would seem that the stages that this page went through were:

(1) The shapes were copied in columns from a source that was (wrongly) believed to have also been written in columns.

This caused letters such as the ornate ‘H’ shape to be copied not semantically as letters, but instead as a series of strokes. I would also expect that these columns were copied downwards as per the following image:

I can see how someone who had not grasped the correct orientation of these letters might have considered their rotated versions to be “hieroglyphic”-like. (I can also see how going from “hieroglyphic”-like to reconstructing the 2500 B.C. Jaredite flight from Egypt to America in submarines might seem a little too extreme for some.)

Note that I can easily see how the bottom of this page (beneath the ragged fold in the museum photograph) could have originally contained a “rude delineation of a circle divided into various compartments”: in which case the page was obviously longer. The overall page was also probably folded in half (parallel to the longest edge) at around this time, because only a single crease is apparent in the photograph.

This, then, would have been what was shown to Charles Anthon in 1828.

(2) The circular calendar section was removed, and the ‘Caractors’ word added.

Note that MacKay et al [*1] showed that the “Caractors” lettering at the top and the lettering of the curious letters were all done in the same ink. However, it also seems likely to me (from the orientation) the Caractors word was added in a quite separate construction phase, and their conclusion that this word was added at the same time as the rest of the letters ought to be examined very carefully indeed.

I believe that the circular calendar section was removed around now because of the following phase…

(3) The “Book of Generation Adam” text is added circa 1830, halfway down the reverse side.

Because this text was added right in the middle of the reverse side, it seems likely to me that the circular calendar section had already been removed (or else this text would probably have appeared further up the [slightly longer] page).

We can date this addition to 1830 or after, because that is when the phrase “The Book of Generation Adam” began to be used in the Mormon Church.

(4) The “Book of Generation Adam” half of the page is removed before 1884.

As Grindael noted, the part of the page with the “Book of Generation Adam” text was almost certainly “torn off sometime before 1884, because it is described as having the same dimensions then as it did in 1903”.

(5) The remaining fragment is sold to the RLDS Church in 1903.

“This collection of documents [was] eventually given into the care of George Schweich, a nephew of David J. Whitmer, who subsequently sold them to the RLDS Church for $2450 in 1903.”

Is this sequence correct?

I don’t honestly know. But if you were to try – while wearing an Intellectual Historian hat – to reconcile what you see in the RLDS Caractors fragment with the different testimonies assuming they were all given in good faith, then I strongly suspect that this sequence is extremely close to where you would necessarily end up.

And perhaps that’s as good as it gets, at this distance in time. Or… perhaps this is just the start?

[*1] MacKay, Michael Hubbard; Dirkmaat, Gerrit J.; Jenson, Robin Scott. The “Caractors” Document: New Light on an Early Transcription of the Book of Mormon Characters – Mormon Historical Studies, vol. 14, No. 1.