Voynich researchers without a significant maths grounding are often intimidated by the concept of entropy. But all it is is an aggregate measure of how [in]effectively you can predict the next token in a sequence, given a preceding context of a certain size. The more predictable tokens are (on average), the smaller the entropy: the more unpredictable they are, the larger the entropy.

For example, if the first order (i.e. no context at all) entropy measurement of a certain text was 3.0 bits, then it would have almost exactly the same average information content-ness per character as a random series of eight different digits (e.g. 1-8). This is because entropy is a log2 value, and log2(8) = 3. (Of course, what is usually the case is that some letters are more frequent than others: but entropy is the bottom line figure averaged out over the whole text you’re interested in.)

And the same goes for second order entropy, with the only difference being that because we always know there what the preceding letter or token was, we can make a more effective guess as to what the next letter or token will be. For example, if we know the previous English letter was ‘q’, then there is a very high chance that the next letter will be ‘u’, and a far lower chance that the next letter will be, say, ‘k’. (Unless it just happens to be a text about the current Mayor of London with all the spaces removed.)

And so it should proceed beyond that: the longer the preceding context, the more effectively you should be to predict the next letter, and so the lower the entropy value.

As always, there are practical difficulties to consider (e.g. what to do across page boundaries, how to handle free-standing labels, whether to filter out key-like sequences, etc) in order to normalize the sequence you’re working with, but that’s basically as far as you can go with the concept of entropy without having to define the maths behind it a little more formally.

Voynich Entropy

However, even a moment’s thought should be sufficient to throw up the flaw in using entropy as a mathematical torch to try to cast light on the Voynich Manuscript’s “Voynichese” text… that because we don’t yet know what makes up a single token, we don’t know whether or not the entropy values we get are telling us anything interesting.

EVA transcriptions are closer to stroke based than to glyph based: so it makes little (or indeed no) sense to calculate entropy values for EVA. And as for people who claim to be able to read EVA off the page as, say, mirrored Hebrew… I don’t think so. :-/

But what is the correct mapping or grouping for EVA, i.e. the set of rules you should apply to EVA to turn it into the set of tokens that will give us genuine results? Nobody knows. And, oddly, nobody seems to be even asking any more. Which doesn’t bode well.

All the same, entropy does sometimes yield us interesting glimpses inside the Voynichese engine. For example, looking at the Currier A pages only in the Takahashi transcription and using ch/sh/cth/ckh/cfh/cph as tokens (which is a pretty basic glyphifying starting point), you get [“h1” = first order entropy, “h2” = second order entropy]:

63667 input tokens, 56222 output tokens, h1 = 4.95, h2 = 4.03

This has a first order information content of 56222 x 4.95 = 278299 bits, and a second order information content of (56222-1) x 4.03 = 226571 bits.

If you then also replace all the occurrences of ain/aiin/aiiin/oin/oiin/oiiin with their own tokens, you get:

63667 input tokens, 51562 output tokens, h1 = 5.21, h2 = 4.01

This has a first order information content of 51562 x 5.21 = 268638 bits, and a second order information content of (51562-1) x 4.01 = 206760 bits. What is interesting here is that even though the h1 value increases a fair bit (as you’d expect from extending the post-parsed alphabet with additional tokens), the h2 value decreases very slightly, which I find a bit surprising.

And if, continuing in this vein, you also convert air/aiir/aiiir/sain/saiin/saiiin/dain/daiin/daiiin to glyphs, you get:

63667 input tokens, 50387 output tokens, h1 = 5.49, h2 = 4.04

This has a first order information content of 50387 x 5.49 = 276625 bits, and a second order information content of (50387-1) x 4.04 = 203559 bits. Again what I find interesting is that once again the h1 value increases a fair bit, but the h2 value barely moves.

And so it does seem to me that Voynich entropy may yet prove to be a useful tool in determining what is going on with all the different possible parsings. For example, I do wonder if there might be a practical way of exhaustively / hillclimbingly determining the particular parsing / grouping that maximises the post-parsed h1:h2 ratio for Voynichese. I don’t believe anyone has yet succeeded in doing this, so there may be plenty of room for good new work here – just a thought! 🙂

Voynich Parsing

To me, the confounding beauty of Voynichese is that all the while we cannot even parse it into tokens, the vast modern cryptological toolbox normally at our disposal does us no good.

Even so, it’s obvious (I think) that ch and sh are both tokens: this is largely because EVA was designed to be able to cope with strikethrough gallows characters (e.g. cth, ckh etc) without multiplying the number of glyphs excessively.

However, if you ask whether or not qo, ee, eee, ii, iii, dy, etc should be treated as tokens, you’ll get a wide range of responses. And as for ar, or, al, ol, am etc, you won’t get a typical linguistic researcher to throw away their precious vowel to gain a token, but it wouldn’t surprise me if they were wrong there.

The Language Gap

The Voynich Manuscript throws into sharp relief a shortcoming of our statistical toolbox: specifically, its excessive reliance on our having previously modelled the text stream accurately and reliably.

But if the first giant hurdle we face is parsing it, what kind of conceptual or technical tools should we be using to do this? And on an even more basic level, what kind of language should we as researchers use to try to collaborate on toppling this first statue? As problems go, this is a precursor both to cryptology and to linguistic analysis.

As far as cipher people and linguist people go: in general, both groups usually assume (wrongly) that all the heavy lifting has been done by the time they get a transcription in their hands. But I think there is ample reason to conclude that we’re not yet in the cinema, but are still stuck in the foyer, all the while there is a world of difference between a stroke transcription and a parsed transcription that few seem comfortable to acknowledge.

In some ways, it’s the shortest of distances from [Ethel Voynich] to [Ethel Merman], so why not “Voynich, The Musical“? Close your eyes, imagine a Broadway stage, take out a mortgage to get yourself a semi-affordable seat, spill a drink on your leg, and you’re as good as there…

VOYNICH – THE MUSICAL!

Act One, Scene One

It’s 1912. A single spotlight illuminates an old trunk in the middle of an otherwise empty wooden stage: there’s dust in the air. We hear slow, sustained violins off-stage, harbingers of the big discovery that is about to happen.

WILFRID appears stage right. He is well dressed (though a little tweedy for our modern tastes), and wears small round glasses. He looks in the prime of his life – there’s a vigour and physical excitement to him. He approaches the trunk, opens it, takes out an old book and peers inside it. As his eyes grow ever wider, the violins swell, and he sings his first number “Friends To The End”.

WILFRID

This never happened – I wasn’t here.
There was never a trunk (that was junk), isn’t this queer?
I conjured a castle, to hide Jesuit lies…
While the customer’s king, I’ll say anything (however unwise).

[Chorus] But you, you were always real
Even if you made me feel
Like an antiquarian schlemiel –
I couldn’t comprehend.
But I knew, I knew when I met
My ugly duckling Juliet
With your strange alphabet
We’d be friends to the end…
Friends to the end.

Act One, Scene Two

Back in London, WILFRID hesitantly shows his newly-acquired manuscript to his wife ETHEL: he thinks it’s going to make them rich. However, ETHEL cannot believe that he has wasted money on something as unbelievably stupid as a book that nobody can read. To make her feelings on the matter completely clear, she sings her angry opening number “Down the drain”.

ETHEL

Little naked women
Standing round or swimming
What is this you’re bringing
To our house?
You can’t read a word of it
Written by a heretic
I can’t see the benefit
To man or mouse

[Chorus] You put good money / Down the drain
Buying enciphered / Castles in Spain
Were those nymphs fogging / Your revolutionary brain?
Or has their writing sent you / Completely insane?

Act One, Scene Three

WILFRID has moved to New York, and is trying (unsuccessfully) to convince wealthy American collectors to buy his unreadable manuscript. Though his sales patter normally charms the birds down off the trees, he’s finding it difficult to find anyone with any affinity for this unusual artefact. His song “It’s No Use” documents his ongoing struggle.

WILFRID

There’s jazz and money in the air
The excitement of a New World at play
New rules, new wealth, new clothes, new hair
America strides into a brand new day

You, sir, with your spats and suits
Your garden parties and Egyptiana
Might I interest you in this book’s strange roots
And its hard-to-pin-down flora and fauna?

[Chorus] It’s no use
My duckling’s no swan
I’ve cooked my goose
My big chance has gone
I’ll find no willing
Bibliophile
Who’ll pay more than a shilling
They’re too mercantile

Act One, Scene Four

It’s 1930 in New York. WILFRID is dying, having never been able to sell his “Roger Bacon” manuscript. ETHEL brings his beloved manuscript to him, so that he can see it one last time. WILFRID sings a song to both of them: “It’s Time To Say Goodbye”.

WILFRID

Perhaps I was wrong / To hope for the best
To follow every wastrel clue / Like a man possessed
Why can’t anybody else / See what I see?
Are they put off by mere / Indecipherability?

[Chorus] It’s time to say goodbye
To the woman I have loved
And greet the naked angels
Hovering above
I’ve seen them for years
Sitting on my shelves
Filling every page of
Quires eleven and twelve

Two of the least commented-on aspects of the Voynich Manuscript’s “Voynichese” alphabet are (a) its symmetry and (b) its partitioning into quite well-known (but distinct) usage groups. For example:

* the four gallows characters, where EVA t and EVA k are almost always interchangeable, while the single-leg shapes for EVA p and EVA f closely mirror the double-leg shapes for EVA t and EVA k. (And let’s leave the strikethrough gallows aside for the moment.)

* the EVA aiin family of letter groups, which all operate in a very specific way: there are no contexts where ain appears that you wouldn’t also see aiin or even aiiin.

* the ar / or / al / ol group, whose members seem to appear within words in much the same way as each other. The air and aiir letter groups might also be related to this set, though this isn’t not 100% clear. Similarly, -am often seems to me (with a hat tip to Emma May Smith, who discussed -m recently) to be something closer to a combination of ar and hyphen, i.e. that -am at the end of a line often resembles the end of the first half of a word broken in half by the line-ending (and where the second half of the word is at the start of the next line, but disguised with an extra letter inserted before it).

* the -dy and -y word endings, which both seem to be cut from almost exactly the same cloth.

* the e / ee / eee / ch / sh / eo group, which seem to me to function slightly differently between A and B pages.

* the qo group, which almost universally seems to operate as a prefix. In those places where we get l- words, we also get qol a lot: and where l- words don’t appear, we get almost no instances of qol.

Cross all the above instances out, and what remains is a very sharply reduced set of usage groups, such as d- words (in particular daiin, which seems to operate in a mysterious world all of its own), o- words (particularly in front of gallows), and y- words.

What about EVA s?

But if you do do this kind of crossing out, you also won’t find a comfortable place for EVA ‘s’ to go. In fact, to my eyes EVA ‘s’ appears to be the single most anomalous character in the Voynichese alphabet: there’s a strong case to be made that it is the most ‘exposed’ single glyph of all of them, and – by that same token – the one we should spend most time on trying to understand. What I’m saying is that EVA s might well be the weakest link in the Voynichese chain.

If you remember to put aside all the completely different ‘sh’ characters (sharing ‘s’ for both of these glyphs was, in my opinion, a foolish mistake in the design of the EVA transcription scheme, *sigh*), you find that ‘s’ occurs about 1.71% of the time in A pages, and about 1.00% of the time in B pages. If you remove any ‘as’ or ‘os’ pairs (as being probably miscopied or mistranscribed ‘ar’ / ‘or’ pairs) from these stats, these figures go down to 1.34% and 0.83% respectively.

And yet some A pages have numerous s characters (e.g f14r, f15r, f24r), while others have one or fewer s characters (e.g. f14v, f18r, f19r): that this single statistic can differ so much between the two sides of the same folio is something that hasn’t really been noted before, as far as I can recall. [Unless any Lorites out there care to show me the precedent I’ve missed: in one of Friedman’s groups, no doubt.]

All of which incidentally reminds me of something that Glen Claston told me he noticed when he was making his transcription (but which I now can’t find in my email archive, *sigh*): that Voynichese had different clusterings of letter usages that would seem to go into and out of fashion (almost as if one kind of ‘mode’ was active now, and then a different mode active later), sometimes by paragraph, sometimes by page. If this is correct, then perhaps ‘s’ is an active part of some ‘modes’ but not others – just an idea.

What about saiin vs daiin?

I find it interesting that sdaiin occurs only once (on f66r), while sdain, sdaiiin, dsain, dsaiin, and dsaiiin don’t occur at all: yet saiin occurs 144 times.

If s- is some kind of prefix token here, then it seems that so too is d-, and in a way that makes the two avoid stepping on each other’s toes.

My suspicion (for what it’s worth) is that while both work as prefix tokens, they in fact code for two quite different classes of mechanisms: and, moreover, that both prefixes are more meta-linguistic than linguistic in any useful sense.

And what about the first column?

EVA s also has a strong tendency to appear as the first letter of a (non-paragraph-starting) line, particularly in Balneo B pages – but this may possibly be because Balneo B tends to have longer paragraphs than elsewhere.

Combine this (a) with the well-known observation that the first word on each line tends to be slightly longer on average than all the other words on a line, and (b) with Philip Neal’s suggestion that the first letters down some Voynich Manuscript pages might well be a vertical ‘key’ or something similar, and you get an interesting possibility to consider: that line-initial ‘s’ may specifically operate as a null that the writing system needs to prepend to certain (typically short) words.

I was thinking about this today, triggered by a Voynich Ninja forum discussion: I wondered if it might be possible to construct a statistical experiment to test my suggestion that line-initial s- might function as a null character that gets prepended to certain short words (such as aiin).

According to the tentative model I have in mind, the (aiin : daiin) ratio for non-line-initial words should be roughly the same as the (saiin : daiin) ratio for line-initial words. And perhaps it would be good to then repeat broadly the same test for non-line-initial (ar : dar) vs line-initial (sar : dar), etc.

However, I don’t have the right counting tools to do this easily: can anyone please run this test? Thanks!

Hi There! We’re looking for people to write up their theories on cipher mysteries such as the Voynich Manuscript, the Beale Papers and how astroturfed the Tea Party is. You may be surprised to discover that your foolish clickbait opinions could earn you upwards of $0.02 per day, and might even be worth double that (if they are so unbelievably bad that they go viral on Slashdot or Reddit).

To tap your teensy spile into this towering cask of wealth, there’s no need for an office, formal clothes or indeed any clothes beyond your normal tattered rags. Simply compose your posts and comments from the comfort of your own bedsit, surrounded by your piles of old newspapers, unreturned library books, and much-loved microwave meal boxes. Who could ask for a better or more convenient life?

Yes, you too can turn your vapid leaden thoughts into 24 carat Internet gold, just like alchemists and well-known YouTube sock puppet presenters the world over already do. And let’s face it, if Stampy and Squid can make it there, then so can you, right?

Who cares if you haven’t cut your toenails since Dubya left the White House? We don’t! Google values novelty over content, so to become a high-value content creator in this brave new online world, all you have to do is tap into the same rickety stream of consciousness that pushes angry unspoken words in your mouth when you’ve yet again found yourself stuck in the non-moving queue at the supermarket till and type, type, type.

There, now doesn’t that feel better! And how much do we charge you for this “keyboard therapy”? $100? $1000? No, not even close – in fact, we pay you for it. A frighteningly small amount, sure, but let’s not bicker over mere semantics.

How do you get going? Just start your own blog, proclaim yourself an expert on a particular subject (it doesn’t matter what, nobody cares), leave back-linked comments on forums and other people’s blogs, or even – now get this – leave comments on your own blog under false names to make visitors think that there’s some kind of ‘community’ buzz around the nonsense you’re passing off as high-quality thought.

Before long you’ll even be ready to cut-and-paste all your tripe into a 65-page ebook and sell it for $12 a pop. Still think this is all a pipe-dream? No, it’s not! Many thousands of people have jumped about this $$$money$$$ $$$train$$$ already, so why not you as well?

Still believe you’re not right for this extraordinary new world? Think again! You’ll shock yourself when you find out quite how painfully easy it is. Amaze your friends (if you have any, which seems fairly unlikely), take that step, and type, type, type!

I thought I’d share with you the following email I recently received via an anonymous remailing service:

This is being written to you on behalf of a large group of Voynich theorists. Even though we disagree amongst ourselves on everything to do with the Voynich Manuscript itself (which some of us prefer to refer to as the “so-called Voynich so-called Manuscript”), the two things we do all whole-heartedly agree about are (1) how much we despise your pathetic crusade against us, and (2) how much we abhor your ridiculous insistence on primary evidence and testable hypotheses.

Be assured that when one of us does eventually manage to prove definitively that it is a Mongolian shamanic handbook, a heretical medieval suicide manual, or a stranded alien’s diary, the short term pain of finding out that the rest of us was wrong will be amply wiped out by the long term pleasure of mocking you derisively for the rest of your stupid, pointless life.

You just don’t seem to realise that proper ‘Voynich research’ is in no way historical or scientific. Don’t you understand that it is we who established the one basic ‘fact’ of the discourse long ago? The thing that we made true (by repeating it so many times that it became a fact) is that nobody knows anything definite about the Voynich Manuscript. This is the frame of reference everyone is now compelled to use, and neither you, Wikipedia, René so-called Zandbergen, or indeed anyone else can move outside it: howl at the moon all you like, you’ll achieve nothing.

So you’re just wasting your time trying to make (what you conceitedly and falsely like to think of as) ‘progress’. Anything you try to assert, we deny immediately: it’s just physics, stupid. Moreover, anything you can conceive of asserting, we have probably already denied ten times over. Assert/deny, assert/deny, assert/deny: you really bore us.

Look, can’t you get it into your thick head that we theorists pwn the Voynich big-time? The Beinecke may be the institution who owns the Voynich Manuscript, but that means diddly squat against our total pwnage. Why, when there’s no obvious shortage of rent-a-mouth academics out there, do you think Yale struggled so badly to find anyone to write anything remotely sensible in their recent so-called photofacsimile? They were wasting their time swimming against our tide, just like you’re wasting yours.

OK, we’ll admit there was a brief period during which you were marginally useful to us: that was back when having a post in Cipher Mysteries putting down one of our theories was a bit like a badge of honour. We even had special gamified medals produced, to show off which one of us had had the smarmy Cipher Mysteries treatment (how we laughed): but since you’ve stopped doing even that, we’ve all got tired of your meanderings and not-so-funny posts.

So this is just a collective email from us to say goodbye to you. Even though Voynich research is still stalled in the same cul-de-sac it ever was (which is, by our reckoning, is about a perfect a scenario as can be hoped for), we’ve all moved on from you and your stupid blog. You’re yesterday’s man, if not the day-before-yesterday’s man: not interested, la la la.

Why don’t you go research the Phaistos disk or something else unbelievably lame, and leave the Voynich to the people who really own it? Maybe you’ll find some saddo historians out there who want to read your useless drivel: we certainly don’t.

What is the difference between theories and metatheories? Given that the former can sensibly range from hand-wavy general theories (“the Voynich Manuscript was written by a mad alchemist“) to specific theories (“the Voynich Manuscript was written by a young Leonardo da Vinci, using his right hand“), the debate is more whether we can usefully differentiate between metatheories and general theories.

For me, however, the key attribute that distinguishes Voynich metatheories is that they have a certain ‘turn’ to them, a kind of pivoting self-referentiality that their proponents use to explain away just about everything difficult. For example, hoax theorists (such as Gordon Rugg) respond to almost any attempted historical objections (e.g. those surrounding the apparent paradox of using a 16th century mechanism to create an apparently 15th century manuscript) by saying that “well, obviously the hoaxer was so clever that he/she deliberately made those apparently discordant details look that way”.

They then often go on to point out that the more discordant details the hoaxer had to fake, the more obviously brilliant the hoax: and therefore the more we should admire the brilliance both of the hoax and of the man (yes, it’s normally a man) who was clever enough to notice such a brilliant hoax. And so a Voynich metatheory is a thing that arguably focuses more on explaining away that which doesn’t fit than positively accounting for anything it does sort of fit.

Omphalos

It shouldn’t require particularly deep contemplation before you notice more than a flicker of similarity between the structure of this argument and Omphalos creationism, courtesy of the naturalist Philip Henry Gosse in his 1857 book Omphalos.

“Omphalos” is the Greek word for navel: at the time of Gosse’s book, it was widely believed that Adam (in the Garden of Eden) had a navel despite not having come from a mother’s womb. The conclusion that Gosse famously drew from that is that when God made Adam, He made him complete with a navel: an argument that Gosse then triumphantly upscales to all the geological and fossil evidence that superficially seems to argue against the clearly well-proven Biblical History that showed that the Earth was created in 4004 B.C.

God, then, was something like the ultimate hoaxer: for rather than merely hoaxing some ‘ugly duckling’ unreadable book, He actually hoaxed the entirety of time and space to make it look as though the Earth was older than its ‘actual’ age (6021 years or so). As hoaxes go, you’d have to admit that this is top drawer stuff.

Of course, modern creationists have (ironically enough) evolved far more sophisticated arguments than Gosse ever did: but, frankly, I have to say that I’m not wildly interested in either Gosse or them. All that’s important for us here is that Creationism is, similarly, designed far more to explain away that which doesn’t fit the Bible than to explain that which does.

And what holds for Voynich hoax theories broadly goes for other Voynich metatheories focused on explaining all the difficult stuff away: for example, that the Voynich is glossolalia, or channelled, or some kind of otherwise inspired gibberish, or even a shipwrecked alien’s diary (I kid you not, *sigh*). Or even, with more than a half-nod in Stephen Bax’s direction, that Voynichese is composed of the scattered polyglot fragments of so many different languages that we can only recognise a tiny handful of words here and there: all of which anti-linguistic turn is also a metatheory, because it seeks not to explain the few words it grabs but to explain away the 99.9% or more of the other words it fails to account for. Foolishness.

There is, of course, already a large literature on a large field of constructivist mental endeavouring very similar to these metatheories: it is, by another name, pseudoscience. There, the whole point of pseudoscience isn’t to produce theories that can be tested (and possibly disproven), but instead to produce metatheories that are logically impervious to criticism – i.e. that use their central ‘turn’ to invalidate counterarguments.

This also has the effect of making those metatheories impervious to testing, and to refining, and to improving: and thus leaves them far more akin to something handed down in a Very Important Book Indeed. But you knew that already.

In the end, the only thing that separates Voynich metatheories from pseudoscience is that the people putting forward Voynich metatheories tend to be more interested in the postmodernist self-amusement of their ‘turn’ (a kind of awesome wonder that nobody else seems to have noticed how much their metatheory explains away) than in actually engaging with proof or disproof.

And if that’s a good thing, I’m a monkey’s uncle. Or he’s mine. 🙂

Put wrestling fan US President-elect Donald Trump in the ring with the Voynich Manuscript, and who would win? Actually, the two may be more evenly matched than you think…

For a start, both are surrounded by groups of people who claim to know what they mean (but almost certainly don’t), while remaining utterly unfathomable.

And as far as street cred go, both have appeared in the Marvel Universe: Trump in New Avengers Vol. 1 #47

…and the Voynich Manuscript in “Black Widow & The Avengers” #18:

Black-Widow-And-The-Avengers-Voynich

It’s also hard not to notice that the Voynich Manuscript author’s apparent obsession with (mostly) naked nymphs…

…oddly parallels Trump’s long association with (and indeed ownership of) Miss USA, Miss Teen USA, and Miss Universe (just try not to mention Miss Mexico, that might not end well):

Moreover, they are both big on the East Coast (New York and New Haven respectively), where both have achieved notoriety, each in their own unique way. Also, it’s hard not to find anyone commenting on either Donald Trump or the Voynich Manuscript who doesn’t in some way use them as blank canvases, projecting what they want (or perhaps fear) to see onto them.

Yet perhaps this hard-to-pin-downness and malleability (qualities eerily like those of the Voynich), ultimately, formed the core secret of Trump’s success at the presidential polls: given such a long series of mixed and often contradictory messages, people – like so many Voynich theorists – heard what they wanted or hoped to hear, who can say?

And finally, both arguably achieved their biggest public goals in November 2016: on the 1st, the Voynich Manuscript was published by Yale University Press in a sumptuous (if largely uncritical) edition…

…while on the 8th, The Donald defeated The Wicked Witch. Just like a fairy tale, right? (Which is, of course, not the same as a happy ending – the Brothers Grimm were often as grim as their name.)

To my eyes, perhaps the most unsettling comparison between Donald Trump and the Voynich Manuscript is that November 2016 also marked the end of a quest for them both: a quest for respectability, to become part of the Establishment… but on their own terms. By which I mean that they are both (I think) now starting to re-cast and reinvent the whole idea of what the Establishment means in 2017 and beyond.

Will it be long before swathes of politicians remould their ever-fickle personae in Trump’s image, or before history textbooks start to use the Voynich Manuscript as didactic material? Right now, I’m not sure I’m massively comfortable about either of these paths, to be honest: but perhaps both are now somehow inevitable.

Me, I’m neither a fan nor a critic of Donald Trump: yet I can’t help but be struck how his quest for the Presidency was effectively won via a prolonged gladiatorial beauty contest, much like a peculiar merger of both his love of wrestling (a televisual theatre of pre-teen anger) and Miss (Whatever) pageants (a televisual theatre of sexless beauty).

And I can’t help wondering if – like Voynich researchers, ever reaching for the apparently unattainable – it will turn out that he was more driven by winning the ultimate competition for political power than the idea of actually holding the reins (and the burdensome moral responsibilities) of high office. Similarly, would the Voynich Manuscript still hold its particular appeal if we could read it, if its quest for meaning was finally over?

I was mooching round the British Academy’s website a little earlier (I was trying to find the Neil Ker Memorial Fund, which I had forgotten the name of), when I noticed its page on British Academy Conferences – this is where ‘any’ UK citizen can propose a conference on any subject (as long as they’re prepared to run it themselves, and don’t mind being turned down with no reason being given).

And so the (as yet hypothetical) question naturally follows: if I was organizing a British Academy-hosted conference on the Voynich Manuscript, how would I approach the challenge? What should that kind of Voynich Manuscript conference look like?

What Isn’t Worth Looking At

It’s easy enough to list all the things I wouldn’t want to let onto the podium:
* Voynich theories [– too boring for words –]
* Voynich metatheories [– too sad for words –]
* Voynich iconography / iconology [– too free-floating for words –]
* Voynich linguistics [– sorry, but it’s just not written in an obscure language –]
* Voynich cryptology [– sorry, but it’s just not written in any obviously categorisable cipher –]

Some may be surprised that I would exclude both Voynich linguistics and Voynich cryptology. The simple reason for this is that I very strongly believe that we still don’t know enough about the Voynich’s basics to do meaningful analysis about either. For example, the existence of “Neal Key”-like behaviour offers a strong counter-argument not only against any kind of simple-minded linguistic take, but also against any kind of straightforward substitution cipher argument derived from a reading of cryptographic history.

The only reference to fifteenth century non-syllabic transposition ciphers I know of is a brief passage in Alberti’s book which I read as a reported speech account of a debate between Alberti and a transposition cipher practitioner. There is (unless you know better) not even one pre-1500 non-syllabic transposition cipher cryptogram still extant.

And so Voynich research is still in a position where neither linguistic approaches nor historical cryptological approaches have any ‘moral high ground’ to argue their respective cases. The Voynich Manuscript laughs pityingly at both camps’ feeble efforts.

So… what would I want attendees to be discussing, then?

The Joy Of The Concrete

As per my recent list of 100 Voynich (research) problems, there remains – despite all the excellent work that has been done since the Beinecke first released digital scans in 2004 – a huge amount of fundamental stuff that we still don’t know about the Voynich Manuscript.

The problem with not knowing how pages, paragraphs, lines, words, and even letters were constructed at a really basic level is that this makes it extremely difficult to know whether our transcriptions are a help or a hindrance. What order were lines written? (Philip Neal points to evidence that some line interleaving may have taken place in at least Q20.) What order were strokes in letters written? (Back in 2006 in “Curse”, I pointed to evidence that on some pages, the terminal EVA ‘n’ stroke of ‘daiin’ may have been added as a separate pass). And so forth.

Hence the core stuff I would want conference attendees to focus on is purely that-which-is-concrete: things that can be seen, highlighted, measured, cross-referenced, scanned, indexed, counted, etc. What were the original gatherings and their nesting orders? What happened to those gatherings to turn them into quires? What construction stages can we solidly identify? (There must be close to twenty of them, is my current best estimate). Can we order (or even date) these construction stages? What, ultimately, was the alpha state of the manuscript?

But this isn’t just a matter of assembling some codicological dream-team (even though many of the most basic unanswered questions are clearly codicological in nature). There’s also the tricky matter of the Currier Hands and the f116v marginalia (which would require a great deal of palaeographical expertise to untangle): and also the taxing matter of the differences between the various Currier languages, which is something closer to meta-linguistics than linguistics per se.

In all cases, the central include-it/don’t-include-it criterion would be whether any given analysis would advance our knowledge of the Voynich without having to assume any given historical narrative or theory far beyond the basic radiocarbon dating.

Never mind being carbon-neutral, could such a conference be theory-neutral? My hope is that it could, but I do appreciate that this is something many Voynich researchers could easily find difficult to work to, or to achieve.

Linguistics vs meta-linguistics

I think it’s fair to say that the long-term relationship between Voynich research and Voynich linguistic research has not been greatly productive. Given that the mainstream Voynich research position has for more than fifty years been that Voynichese is simply not a “language” in any straightforward sense of the term, it is dispiriting to see Stephen Bax continually raking over the same barren concrete surface, ever-announcing to the world that the few motes of dust he has accumulated do in fact do actually form the basis of some über-obscure hybridized historical linguistic system over and above mere statistical chance.

Would out-and-out linguistics researchers such as Stephen Bax be welcome at such a conference? With the putative roles reversed, Bax has certainly made it clear online that mainstream Voynich researchers (errrm… particularly me, it would seem) would be distinctly unwelcome at any Voynich-themed seminar he would organize.

But what annoys me so much about Bax isn’t that what he puts forward is just plain wrong (even though it is), but that by mistakenly telling all and sundry that the challenge of Voynichese is one where its beginning, middle and end all fall inside a purely linguistic domain, he utterly misrepresents the specific difficulties it poses.

Rather, what Voynichese does present to researchers is an overlapping combination of linguistics (e.g. actual language content), meta-linguistics (content transformation, e.g. abbreviations, codes, and transposition), and misdirection (e.g. substitution and steganography). Hence the primary difficulty we face with Voynichese is more one of determining its internal boundaries: what is misdirection, what is language, and what is meta-linguistics? If Voynich linguistic researchers could successfully accept that this question is the real one we need to answer before trying to push forward, then perhaps we could all start to work together in a reasonably productive way.

So I have to say I’m hugely encouraged that at least one Voynich linguistics researcher out there (Emma May Smith) has recently started looking in a genuinely agnostic way at all the difficult stuff that confounds those who try to stick to fairly simple-minded linguistics accounts. If only more linguistics researchers followed her example. *sigh*

Raman Imaging

There is a final twist: in the ideal world of my imagination, the conference stage would be part-laboratory too, with a live link between a Raman imaging device in New Haven looking at a series of pages of the Voynich Manuscript, sometimes through a microscope. The conference attendees would be able to discuss and propose different tests live, so that they could see “under the skin” (sometimes literally) of the manuscript.

But once you throw that into the mix, would this even qualify as a “conference” any more? Or would it actually be closer to some kind of Reality TV historical research happening, in a way that’s so acutely of-the-moment that it hasn’t even got its own annoying hashtag yet?

Put that way, should I be thinking in terms not of the British Academy, but of Channel 4 and Smithsonian TV?

It’s well known that f1r (the very first page of the Voynich Manuscript) has an erased ownership mark. Under UV light, you can see that it says (something along the lines of) “Jacobj à Tepenece / Prag” (Photo Credit: © ORF):

For everyone who isn’t heavily invested in some kind of hoax-centric Voynich Manuscript meta-theory, the presence of Jacob Tepenecz’s mark on the first page would seem to be a pretty good indication that he was an early Voynich owner. Combining that with the mention of Emperor Rudolf II in the Marci letter would suggest that the Emperor himself was quite likely also an early Voynich owner (though no direct evidence of that has yet been found).

What’s almost completely unknown is that the Voynich Manuscript seems to me to have probably also had a second ownership mark: only this time, the erasers physically excised the whole bottom section of the foldout page containing it.

The Voynich owner’s mark on f102?

The two-panel recto (front) of f102 looks like this…

…while the two-panel verso (back) of f102 looks like this:

Note that the folio number at the top rght of the left verso panel was obviously added while the panel was folded back: and that the number at the bottom right of the right verso panel is a quire number. Let’s look a little more closely at the recto side of the excision:

Here we can (I think) clearly see that this section was cut out after the plant drawings had been added to the page, and also after the paint had been added to them. And as for the verso side of the cut:

Looking closely at both sides, I think you can also see the difference in quality of cut between the original bifolio cut edge (bottom right, beneath the ’19’ quire mark) and the later excision’s edge: the former is nice and clean, while the latter is ragged, as if that cut was done with a cutting tool that was not quite as sharp.

Dating the Layers

Given that the paints used here are untidy (and, truth be told, a bit nasty), it would seem reasonable to infer that these were probably added by Jorge Stolfi’s putative “heavy painter” very late in the Voynich Manuscript’s life: say, not too far from 1600 or so. All of which would seem to imply that this section of vellum was removed after that date.

And given that the f1r ownership mark was erased some time after 1609, I think it would be reasonable to conclude that this section of the bifolio was probably excised at the same time. While it’s possible that Baresch cut this out when he was (apparently) cutting out various single pages from different sections to send to Kircher, my judgement is that that’s a far less likely scenario.

Missing pages and heavy paint aside, the only other thing in the manuscript that seems to have been messed around with in any significant way is the ownership mark on f1r: hence it seems likely to me that f102v had also had some kind of ownership mark added to it in the blank space next to the ’19’ mark, that was removed at the same time.

And that in turn suggests to me that this quire mark was not ’19’ (as in ‘the number after 18’), but that it was instead a fifteenth century ‘1-9’ (i.e. ‘prim-us‘). Which in turn suggests to me that this quire and the other pharma quire were a pair of freestanding quires / gatherings in a separate book, that was merged in with all the other quires. As I wrote in Curse in 2006, there seems strong visual evidence (from the sequence of jars that progress from simple to complex) that what is now Q17 originally came after Q19.

Furthermore, there seems to be evidence of stitching holes on the exposed (and somewhat worn and discoloured) fold of f102: the presence of these holes and discoloration suggests to me that f102 may originally have been folded and nested rather differently to what we see now.

This also suggests to me that Q20’s quire number was probably added by a different (and later) hand to the hand that added the Q19 quire number, but one trying to ape the style of the Q19 quire mark hand. I therefore predict that these will turn out to have been written in very different inks.

Reading the Invisible

At this point, you might ask: so what? Even if this was indeed an ownership mark that was excised, what does it matter? Who cares?

Well: what’s interesting is that I think there is a small chance that we will be able – with just the right imaging technique – to see traces of whatever was written on f102v1 faintly imprinted on f102r2. Alternatively, we might be able to detect the faintest of contact transfers carried across onto the facing page (i.e. f103r).

In both cases, these would probably be far too subtle to see with the naked eye: but if we are determined enough to find a way of looking at precisely the right piece of vellum in precisely the right way, who can tell what we’ll find there?

Inspired by Julian Bunn’s just-released “Puzzles of the Voynich Manuscript” ebook (review to follow), I decided to post a list of a hundred Voynich problems – that is, issues that researchers repeatedly bump into when trying to make sense of the Voynich Manuscript, and yet which nobody seems to have definitively resolved in the last century.

Unlike Julian’s ebook, this list is targeted squarely at existing Voynich researchers. If you are genuinely trying to make sense of the Voynich Manuscript and yet aren’t aware of pretty much all these problems, it could well be that you are not seeing the bigger picture.

Needless to say, good solutions will aim to resolve many (if not all) of these “Voynich problems”: while poor solutions (of which I’ve already seen far too many) tend to target only a few – in fact, I’ve seen a fair few alleged ‘solutions’ that don’t even attempt to resolve any of them.

Realistically, though, given that even the most basic Voynich problems – such as the existence of one or more ‘heavy painters’ – continue to be disputed, I don’t expect this list to dramatically shorten any time soon. But who can tell what the next twelve months will bring? 😉

Bifolio nesting / grouping problems

Herbal quires – were these originally split into A and B pages? [Probably, but we don’t know]
Herbal quires – what was their original layout?
What is the relationship between herbal pages and pharma pages? [Here’s one surprising thing Rene highlighted back in 2010]
Was Q9 originally bound in the way John Grove suggested (i.e. along a different fold) – or not?
Was Q13 originally a single quire, or was it (as Glen Claston proposed) in two Q13A / Q13B parts?
Was Q20 originally a single quire, or was it (as I proposed?) in two Q20A / Q20B parts?
Why are there apparently so many different quire number hands?
What was the relationship between Q8 and Q9?
Where did the nine rosette page originally sit?
Are the two pharma sections reversed relative to their original order?
Are pharma sections explicitly linked to herbal pages? [i.e. by handwriting or textual content]
Were there any intermediate bindings, and can we reconstruct them?
Can we reconstruct the original [possibly unbound] page order?

Ink / Paint Problems

Was there a heavy painter?
Were there multiple heavy painters?
Was the heavy paint added before or after the folio numbers? [Rene: there’s green paint over the “42” folio number]
What kind of paint is the heavy blue paint?
Can we use Raman imaging to separate codicological layers? [Particularly on f116v, but in many other places too]
Were the original paints all organic washes derived from plants etc?

Marginalia Problems

Why are the f17r marginalia unreadable?
Why are the f66r marginalia unreadable?
Why are the f116v marginalia unreadable?
What language were the Zodiac month names written in?
Were the “chicken scratch” marginalia originally grouped together?
Does the f57v marginalia read ‘ij'(with a bar across the top)?

Page Layout Problems

Why is the first letter of each page so often a gallows character?
Why is the first letter of each paragraph so often a gallows character?
What meaning do long gallows have?
Whay meaning do ornate gallows have?
What is the purpose or function of Horizontal Neal keys?
What is the purpose or function of vertical Neal keys?
Why do lines of text so often end with the EVA letter m?
Why should position on the page affect anything to do with the text?
John Grove called stray sections of text right-justified at the end of paragraphs “titles” – what are these for?
Are there any buried (concealed) titles in the Voynich Manuscript?
Are there any 15th century non-syllabic transposition ciphertexts extant?

Voynichese letter-shape problems

Why are the four gallows shaped in the specific way that they are?
Is the presence of ‘4o’ in 15th century Northern Italian ciphers telling or coincidental?
Is the similarity between ‘aiiv’ / ‘aiir’ and medieval page references telling or coincidental?
Was the ‘v’ (EVA ‘n’) shape written in one pass or two? [There are instances where the ink on the final stroke looks to have been added in a different ink]
Should c-gallows-h be read as one, two, or three glyphs?
Does any known 15th century cipher include steganographic tricks for hiding Roman numbers?
Or indeed for Arabic numerals?

Voynichese word structure problems

In a text of this size there must be numbers somewhere – so where are they?
Do we even know how to parse Voynichese?
Why are words ending in -9 (EVA “-y”) so common?
Might -9 be a token indicating truncation?
Why are words ending in -89 (EVA “-dy”) so common?
What could cause sequences such as “ororor” to appear in the text?
Might ‘or’ be ciphering ‘M’ ‘C’ or ‘X’ or ‘I’? (i.e. Roman numbers that appear repeated)
Why do A section words and B section words have such different average lengths?
Might this be (as Mark Perakh suggested) because of variable-length abbreviation?
Where are all the vowels?
Why is the ratio (number of unique words : number of words) so large compared to normal languages?
Where are all the short words?
Given that the alphabet is so small, could one or more of the letters really be nulls?
“Dain dain dain”, really?
“Qokedy qokedy”, really?
Is 4o- (EVA “qo-“) a freestanding word?
Why is there so little information in a typical Voynichese word?
Why are so many words so similar?

Language/dialect problems

What is driving the differences between Currier A and Currier B?
Can we definitively say that A pages came before B pages?
Can we definitively say that the B system evolved out of the A system?
Can we map A words / letters onto B words / letters?
Can we create an evolutionary order in which the system evolved?
Where does labelese fit into the A/B model?
Are localised vocabulary differences content-driven or system-driven?
Can we determine any unique words or phrases that map between A and B pages?
Is there an inbuilt error rate? (e.g. qo- -> qa-, or aiin -> oiin)
When low-frequence words cluster, is this because of the system, because of semantic reference or because of auto-copying?

Drawing problems

What are the four direction characters in the magic circle page?
What are the four direction characters in the hidden magic circle page?
What are the four direction characters in f57v?
Why is there a mix of real plants and imaginary plants?
Are similar diagrammatic balneo nymphs found in any other 15th century manuscript?
Were the zodiac nymphs inspired by the zodiac nymphs in Vat Gr 1291, or is that just coincidence?
Is the little dragon similarity to the little dragon in a Paris MS telling or coincidental?
Is the cluster of stars the Pleiades, or something else entirely?
Nine rosette page – what’s going on there?
Will we ever identify the freestanding castle in the nine rosette foldout page?
If we reorganize Q9 as per John Grove’s suggestion, a 7-page sequence of ‘planets’ appears – is this telling or merely coincidental?
What was the source of the Zodiac roundels?
Are there multiple drawing layers on the nine rosette page?
Were all the sunflower pages grouped together originally?
Is there any tangible relationship to other Quattrocento herbals?
More generally, why is there such a sustained absence of reference to existing manuscripts?

Dating / history problems

Given the links to Rudolf II’s court, why is there no Rudolfine documentation? Might we have been looking in the wrong places?
What might the supposed connection to Roger Bacon signify? Monastic ownership, perhaps?
Why has the radiocarbon dating range not been explicitly supported by even a single piece of art history?
Why, despite the large number of people who have looked at the Voynich Manuscript in great detail, is there no mainstream art history narrative for it?

Other Voynich problems

Currier thought that a number of different hands contributed to the Voynich Manuscript’s writing – was he correct?
What is the significance of the 17 x 4 ring sequence on f57v? Might it have been an 18 x 4 sequence (e.g. 5 degree steps) but where one pair of letter-shapes has been ‘fused’ to form a fake gallows-like character?
Why did the manuscript’s maker forcibly rub a hole through the vellum? [Not as easy as it sounds, because vellum is strong stuff]
Why use vellum at all?
Why were the two sides of the vellum so heavily equalized?
On f112, is the gap on the outside edge a vellum flaw, or a faithful copy of a vellum flaw in the original document from which it was copied?
Are the main marginalia (e.g. michitonese) by one of the Currier hands?
What are the “weirdos” on f1r all about?

PS: I may not have ended up with exactly 100 Voynich problems, but it’s pretty close to a hundred… and I may add some more along the way. :-p