Well, here’s a thing. The Thirteenth Oxford Medieval Graduate Conference, to be held in a month’s time at Merton College (31st March 2017 to 1st April 2017) on the theme of “Time : Aspects and Approaches”, has a Voynich-themed paper in its Manuscripts and Archives session on the second day (11:30am to 1:00pm).

This is “Asphalt and Bitumen, Sodom and Gomorrah: Placing Yale’s Voynich Manuscript on the Herbal Timeline“, presented by Alexandra Marraccini of the University of Chicago. The description runs like this:

Yale Beinecke MS 408, colloquially known as the Voynich manuscript, is largely untouched by modern manuscript scholars. Written in an unreadable cipher or language, and of Italianate origin, but also dated to Rudolphine court circles, the manuscript is often treated as a scholarly pariah. This paper attempts to give the Voynich manuscript context for serious iconographic debate using a case study of Salernian and Pseudo- Apuleian herbals and their stemmae. Treating images of the flattened cities of Sodom and Gommorah from Vatican Chig. F VII 158, BL Sloane 4016, and several other exempla from the Bodleian and beyond, this essays situates the Voynich iconography, both in otherwise unidentified foldouts and in the manuscript’s explicitly plant-based portion, within the tradition of Northern Italian herbals of the 14th-15th centuries, which also had strong alchemical and astrological ties. In anchoring the Voynich images to the dateable and traceable herbal manuscript timeline, this paper attempts to re-situate the manuscript as approachable in a truly scholarly context, and to re-characterise it, no longer as an ahistorical artefact, but as an object rooted in a pictorial tradition tied to a particular place and time.

BL Sloane 4016 is a similar-looking herbal that Voynich researchers know well. Most famously, Alan Touwaide wrote a 500-page scholarly commentary on it (as mentioned in Rene’s summary of Touwaide’s chapter in the recent Yale facsimile). It dates to the 1440s in Lombardy, and even has a frog (‘rana’) on folio 81:

Marracini herself is an art historian who previously graduated from Yale, and who has an almost impossibly perfect set of research interests:

Her research focuses on Late Medieval and Early Modern scientific images, particularly alchemical and medical material, in England, Scotland, Germany, and the Netherlands. Her interests in the field also include book history and manuscript studies, Late Antique material culture, and the historiography of art, particularly in Warburgian contexts. Currently, she is writing on the history of Hermetic-scientific images and diagrams, and her work on Elias Ashmole’s copies of the Ripley Scrolls is forthcoming in the journal Abraxas.

All of which looks almost too good to be true. It’s just a shame her presentation falls on April Fool’s Day, so we’re bound to have people claiming that she doesn’t really exist and it’s all a conspiracy etc. 😉

A few days ago, Australian robotics hacker Marcel Varallo (whose gladiatorial hacks making Roombas fight each other amuse me greatly) very kindly posted up two new scans of the Somerton Man’s Rubaiyat code (along with many megs of his collected Somerton Man stuff) on his blog.

I’ve put the three scans we now have on a Cipher Foundation Rubaiyat Code page, and strongly recommend that people use one of the new scans as a basis for doing any image processing work, rather than the one that has been on the Internet for years.

For example, if you put the three scans’ “Q” shapes side by side and try doing image processing experiments on them…

…what you find is that the so-called “microwriting” (found in the leftmost of the three images) was simply a quantizing artefact introduced when the original JPEG image had its brightness and contrast adjusted. With the new (slightly higher resolution, and generally much smoother) scan, all that nonsense disappears. There is no ‘microwriting’ there at all: The End.

Voynich researchers without a significant maths grounding are often intimidated by the concept of entropy. But all it is is an aggregate measure of how [in]effectively you can predict the next token in a sequence, given a preceding context of a certain size. The more predictable tokens are (on average), the smaller the entropy: the more unpredictable they are, the larger the entropy.

For example, if the first order (i.e. no context at all) entropy measurement of a certain text was 3.0 bits, then it would have almost exactly the same average information content-ness per character as a random series of eight different digits (e.g. 1-8). This is because entropy is a log2 value, and log2(8) = 3. (Of course, what is usually the case is that some letters are more frequent than others: but entropy is the bottom line figure averaged out over the whole text you’re interested in.)

And the same goes for second order entropy, with the only difference being that because we always know there what the preceding letter or token was, we can make a more effective guess as to what the next letter or token will be. For example, if we know the previous English letter was ‘q’, then there is a very high chance that the next letter will be ‘u’, and a far lower chance that the next letter will be, say, ‘k’. (Unless it just happens to be a text about the current Mayor of London with all the spaces removed.)

And so it should proceed beyond that: the longer the preceding context, the more effectively you should be to predict the next letter, and so the lower the entropy value.

As always, there are practical difficulties to consider (e.g. what to do across page boundaries, how to handle free-standing labels, whether to filter out key-like sequences, etc) in order to normalize the sequence you’re working with, but that’s basically as far as you can go with the concept of entropy without having to define the maths behind it a little more formally.

Voynich Entropy

However, even a moment’s thought should be sufficient to throw up the flaw in using entropy as a mathematical torch to try to cast light on the Voynich Manuscript’s “Voynichese” text… that because we don’t yet know what makes up a single token, we don’t know whether or not the entropy values we get are telling us anything interesting.

EVA transcriptions are closer to stroke based than to glyph based: so it makes little (or indeed no) sense to calculate entropy values for EVA. And as for people who claim to be able to read EVA off the page as, say, mirrored Hebrew… I don’t think so. :-/

But what is the correct mapping or grouping for EVA, i.e. the set of rules you should apply to EVA to turn it into the set of tokens that will give us genuine results? Nobody knows. And, oddly, nobody seems to be even asking any more. Which doesn’t bode well.

All the same, entropy does sometimes yield us interesting glimpses inside the Voynichese engine. For example, looking at the Currier A pages only in the Takahashi transcription and using ch/sh/cth/ckh/cfh/cph as tokens (which is a pretty basic glyphifying starting point), you get [“h1” = first order entropy, “h2” = second order entropy]:

63667 input tokens, 56222 output tokens, h1 = 4.95, h2 = 4.03

This has a first order information content of 56222 x 4.95 = 278299 bits, and a second order information content of (56222-1) x 4.03 = 226571 bits.

If you then also replace all the occurrences of ain/aiin/aiiin/oin/oiin/oiiin with their own tokens, you get:

63667 input tokens, 51562 output tokens, h1 = 5.21, h2 = 4.01

This has a first order information content of 51562 x 5.21 = 268638 bits, and a second order information content of (51562-1) x 4.01 = 206760 bits. What is interesting here is that even though the h1 value increases a fair bit (as you’d expect from extending the post-parsed alphabet with additional tokens), the h2 value decreases very slightly, which I find a bit surprising.

And if, continuing in this vein, you also convert air/aiir/aiiir/sain/saiin/saiiin/dain/daiin/daiiin to glyphs, you get:

63667 input tokens, 50387 output tokens, h1 = 5.49, h2 = 4.04

This has a first order information content of 50387 x 5.49 = 276625 bits, and a second order information content of (50387-1) x 4.04 = 203559 bits. Again what I find interesting is that once again the h1 value increases a fair bit, but the h2 value barely moves.

And so it does seem to me that Voynich entropy may yet prove to be a useful tool in determining what is going on with all the different possible parsings. For example, I do wonder if there might be a practical way of exhaustively / hillclimbingly determining the particular parsing / grouping that maximises the post-parsed h1:h2 ratio for Voynichese. I don’t believe anyone has yet succeeded in doing this, so there may be plenty of room for good new work here – just a thought! 🙂

Voynich Parsing

To me, the confounding beauty of Voynichese is that all the while we cannot even parse it into tokens, the vast modern cryptological toolbox normally at our disposal does us no good.

Even so, it’s obvious (I think) that ch and sh are both tokens: this is largely because EVA was designed to be able to cope with strikethrough gallows characters (e.g. cth, ckh etc) without multiplying the number of glyphs excessively.

However, if you ask whether or not qo, ee, eee, ii, iii, dy, etc should be treated as tokens, you’ll get a wide range of responses. And as for ar, or, al, ol, am etc, you won’t get a typical linguistic researcher to throw away their precious vowel to gain a token, but it wouldn’t surprise me if they were wrong there.

The Language Gap

The Voynich Manuscript throws into sharp relief a shortcoming of our statistical toolbox: specifically, its excessive reliance on our having previously modelled the text stream accurately and reliably.

But if the first giant hurdle we face is parsing it, what kind of conceptual or technical tools should we be using to do this? And on an even more basic level, what kind of language should we as researchers use to try to collaborate on toppling this first statue? As problems go, this is a precursor both to cryptology and to linguistic analysis.

As far as cipher people and linguist people go: in general, both groups usually assume (wrongly) that all the heavy lifting has been done by the time they get a transcription in their hands. But I think there is ample reason to conclude that we’re not yet in the cinema, but are still stuck in the foyer, all the while there is a world of difference between a stroke transcription and a parsed transcription that few seem comfortable to acknowledge.

Given that the Zodiac Killer’s first big cipher (the Z408) got cracked so quickly, it shouldn’t really be a surprise that he used a slightly different system for his second big cipher (the Z340). What is (arguably) surprising is that whatever change he made to it has not been figured out since then.

But what was he thinking? What did he want from a cipher? And how might his needs have changed between Z408 and Z340?

The Z408

Ciphers are normally made to be as strong as practically possible, given the technological, time, and resource constraints that apply to both sender and receiver: and with the two main driving needs being privacy and secrecy. Note that these aren’t always the same thing: the way I usually describe it is that while sex with your husband is private, sex with your tennis coach is secret. 😉

And so the first thing I find cryptographically interesting about the Zodiac Killer is that he was creating a cipher from a slightly angle from either of these: and he certainly wasn’t trying to communicate in any normal sense of the word.

Rather, I think that the point of Z408 was to be taunting, and to demonstrate to the police that he was in control, not them.

So imagine the Zodiac’s probable fury, then, when little more than a week after his three Z408 cryptograms appeared in local newspapers (the Vallejo Times-Chronicle, the San Francisco Examiner and the San Francisco Chronicle), Donald and Bettye Harden were all over the front pages explaining how they had cracked them.

Didn’t they know who was supposed to be in control here?

What was worse, the Hardens hadn’t used cryptological hardware or even high-powered cryptological smarts. They’d just used the Zodiac’s egoism (they guessed the first letter was “I”) and his psychopathic bragging (they guessed he would use the word KILL multiple times) as keys to his cryptographic front door: and then marched straight in.

I think it’s fairly safe to expect that the Zodiac was pretty pissed off by this.

Note that the Hardens carried on trying to crack the Z340 for many years afterwards: according to their daughter, her “mother wrote poetry and was as absorbed in her writing as she became with the Zodiac codes. She worked on the second code on and off for the rest of her life.

The Z340

Comparing the overall style of the Z340 with that of the Z408, there seems to be plenty of reasons to think that the two are, at heart, not wildly different from each other. And yet (as is widely known) all the big-brained homophonic solvers written since haven’t made any impact on the Z340 at all.

All the same, I think the second interesting thing to note is that the changes to the Z340 system were surely not made to defend against computer-assisted codebreaking (because that hadn’t yet happened), but rather to make the updated system Harden-hardened, so to speak.

What does this mean? Well, we can probably infer that the first letter of the Z340 is almost certainly not I (not that that helps us a great deal) and the Zodiac Killer must have done something to conceal or remove the KILL weakness.

But, in my opinion, that latter change would surely not have been a theoretically-motivated cryptographic adaptation (he was without much doubt an amateur cryptographer), but rather something pragmatic and empirical, perhaps along the lines of:
* adding a repeat-the-last-letter token
* add an LL token
* add an ILL token
* add nulls inside tell-tale words
* etc

But there’s a problem with all of these. In fact, there are several problems. 🙁

The Problems

The first problem is that I don’t currently believe any of the above changes are disruptive enough to explain what we see in the Z340.

The basic stats of the four main Zxxx ciphers are:
Z408: 408 symbols, from a set of 54 unique symbols. (Note: E has 7 homophones, AST have 6 each, IO have 5 each, N has 4, FLR have 3 each, DHW have 2 each, everything else has 1).
Z340: 340 symbols, from a set of 63. [Hence symbols/textsize is 18.5%, a fair bit higher than the Z408’s 13.3%]
Z32: 32 symbols, from a set of 30.
Z13: 13 symbols, from a set of 8.

It would be very tempting to suspect (as many people have) that the Z340 is ‘therefore’ just the same as Z408 but with 39% more homophones. Yet a problem with this popular hypothesis is that it should be well within range of automated homophone solvers, and to date they haven’t managed to make any impact.

A second problem is that the kind of homophone cycles that so characterized the Z408 seem to be largely absent in the Z340: and yet because the Zodiac Killer would not have had any clue that these were a technical weakness of his system, it seems unlikely to me that he would have adjusted his system to work around a weakness that he didn’t actually know was a weakness.

A third problem is that the Z340 has a fair number of asymmetries that don’t fit the it’s-a-straight-homophonic-cipher model. For example, lines 1-3 and 11-13 have (as Dan Olson pointed out some years ago) almost no character repeats.

There are yet other asymmetries: for example, while 63 different symbols appear in the top ten lines, only 60 appear in the bottom ten lines. And there’s the mysterious ‘-‘ shape at the start and end of line 10: and the odd-looking “ZODAIK” sequence on line 20.

One final asymmetry: the ‘+’ shape seems to function differently in the top and bottom halves – it is often preceded by ‘M’ in the top half, but never preceded by ‘M’ in the bottom half.

How does assuming the Z340 is a pure homophonic cipher explain any of these behaviours, let alone all of them?

Lines 1-3 and 11-13, revisited

I keep coming back to the 1-3 and 11-13 property as mentioned here. I think it’s important to say that Dan Olson’s conclusion (that “lines 1-3 and 11-13 contain valid ciphertext whereas lines 4-6 and 14-16 may be fake”) seems likely to be landing a little bit wide of the mark.

To me, this same property of these lines implies (a) that the homophonic versions for each letter were probably used in pure sequence here, but also (b) the homophone cycles were somehow ‘reset’ after ten lines (i.e. the homophone cycles all started again at the start of line eleven). And perhaps also that any characters repeated in the first three lines are rarer characters, rather than the homophone-friendly ETAOINSHRDLU etc.

It might even be that the Zodiac Killer kept on adding homophones as he constructed the cipher UNTIL he had three lines’ worth of essentially unique homophones: that is to say, that the three line blocks in 1-3 and 11-13 are how his system made the choice of the number of homophones, rather than as a consequence of the number of homophones he chose. Nobody has yet (to my knowledge) satisfactorily explained where he came up with his homophonic allocation for Z408: certainly, searching for this in crypto books hasn’t yielded any likely candidates.

Could it be that the Zodiac Killer worked backwards from his actual Z408 ciphertext to determine the number of homophones, rather than worked forward from the number of homophones to the ciphertext?

Update: I received the following off-line comment from David Oranchak, but thought it better to update it within the post itself…

Nick, there are a few other seemingly rare phenomena that can be observed in Z340. I’m curious what you think of them.

The first is the pivots:

http://zodiackillerciphers.com/wiki/index.php?title=Encyclopedia_of_observations#The_.22Pivots.22

Those kinds of patterns are difficult to arise by chance, so they are suspected to be some sort of feature of the encoding scheme.

Z408 is littered with repeating bigrams but Z340 seems to have fewer than would be expected via normal homophonic encipherment of a plaintext in a normal reading direction. However, the bigrams show up again if you consider a periodic operation on the cipher text:

http://zodiackillerciphers.com/wiki/index.php?title=Encyclopedia_of_observations#Periodic_ngram_bias

The count of 25 repeating bigrams jumps to 37 or 41 or even higher, depending on the periodic operation applied to the cipher text. Here is a tool that illustrates the various operations:

http://zodiackillerciphers.com/period-19-bigrams/

You’ve already identified the seemingly rare phenomenon of rows that lack repeating symbols. There are 9 such rows. In 1,000,000 random shuffles of Z340, none had that many rows. In fact, the best that was found was 8 rows which occurred in only 12 of the shuffles.

Your “M+” asymmetry observation seems to fit in with the general observation that repeating bigrams are phobic of certain regions of the text. The lower left, for instance, seems to hate bigrams: http://zodiackillerciphers.com/images/z340-repeating-bigrams.png

Another really strange observation is the distribution of non-repeating string lengths. For each position of Z340, measure how far you can read forward without encountering a repeating symbol. You end up with a string with unique sequences of length L. Jarlve found that for Z340, there is a peak of 26 occurrences of unique sequences of length 17 (which happens to be the width of Z340). It is really interesting that in random shuffles, this phenomenon is only observed on the order of one in a billion shuffles.

Finally, I would recommend that anyone interested in this topic should check out this thread on morf’s Zodiac forum: http://zodiackillersite.com/viewtopic.php?f=81&t=3196 Especially the more recent posts on the latter pages. “Jarlve” and “smokie” in particular are doing fantastic work exploring various transcription schemes that could explain the various curious features of Z340 (in particular, the relationships between periodic bigrams and transposition schemes).

In some ways, it’s the shortest of distances from [Ethel Voynich] to [Ethel Merman], so why not “Voynich, The Musical“? Close your eyes, imagine a Broadway stage, take out a mortgage to get yourself a semi-affordable seat, spill a drink on your leg, and you’re as good as there…

VOYNICH – THE MUSICAL!

Act One, Scene One

It’s 1912. A single spotlight illuminates an old trunk in the middle of an otherwise empty wooden stage: there’s dust in the air. We hear slow, sustained violins off-stage, harbingers of the big discovery that is about to happen.

WILFRID appears stage right. He is well dressed (though a little tweedy for our modern tastes), and wears small round glasses. He looks in the prime of his life – there’s a vigour and physical excitement to him. He approaches the trunk, opens it, takes out an old book and peers inside it. As his eyes grow ever wider, the violins swell, and he sings his first number “Friends To The End”.

WILFRID

This never happened – I wasn’t here.
There was never a trunk (that was junk), isn’t this queer?
I conjured a castle, to hide Jesuit lies…
While the customer’s king, I’ll say anything (however unwise).

[Chorus] But you, you were always real
Even if you made me feel
Like an antiquarian schlemiel –
I couldn’t comprehend.
But I knew, I knew when I met
My ugly duckling Juliet
With your strange alphabet
We’d be friends to the end…
Friends to the end.

Act One, Scene Two

Back in London, WILFRID hesitantly shows his newly-acquired manuscript to his wife ETHEL: he thinks it’s going to make them rich. However, ETHEL cannot believe that he has wasted money on something as unbelievably stupid as a book that nobody can read. To make her feelings on the matter completely clear, she sings her angry opening number “Down the drain”.

ETHEL

Little naked women
Standing round or swimming
What is this you’re bringing
To our house?
You can’t read a word of it
Written by a heretic
I can’t see the benefit
To man or mouse

[Chorus] You put good money / Down the drain
Buying enciphered / Castles in Spain
Were those nymphs fogging / Your revolutionary brain?
Or has their writing sent you / Completely insane?

Act One, Scene Three

WILFRID has moved to New York, and is trying (unsuccessfully) to convince wealthy American collectors to buy his unreadable manuscript. Though his sales patter normally charms the birds down off the trees, he’s finding it difficult to find anyone with any affinity for this unusual artefact. His song “It’s No Use” documents his ongoing struggle.

WILFRID

There’s jazz and money in the air
The excitement of a New World at play
New rules, new wealth, new clothes, new hair
America strides into a brand new day

You, sir, with your spats and suits
Your garden parties and Egyptiana
Might I interest you in this book’s strange roots
And its hard-to-pin-down flora and fauna?

[Chorus] It’s no use
My duckling’s no swan
I’ve cooked my goose
My big chance has gone
I’ll find no willing
Bibliophile
Who’ll pay more than a shilling
They’re too mercantile

Act One, Scene Four

It’s 1930 in New York. WILFRID is dying, having never been able to sell his “Roger Bacon” manuscript. ETHEL brings his beloved manuscript to him, so that he can see it one last time. WILFRID sings a song to both of them: “It’s Time To Say Goodbye”.

WILFRID

Perhaps I was wrong / To hope for the best
To follow every wastrel clue / Like a man possessed
Why can’t anybody else / See what I see?
Are they put off by mere / Indecipherability?

[Chorus] It’s time to say goodbye
To the woman I have loved
And greet the naked angels
Hovering above
I’ve seen them for years
Sitting on my shelves
Filling every page of
Quires eleven and twelve

Two of the least commented-on aspects of the Voynich Manuscript’s “Voynichese” alphabet are (a) its symmetry and (b) its partitioning into quite well-known (but distinct) usage groups. For example:

* the four gallows characters, where EVA t and EVA k are almost always interchangeable, while the single-leg shapes for EVA p and EVA f closely mirror the double-leg shapes for EVA t and EVA k. (And let’s leave the strikethrough gallows aside for the moment.)

* the EVA aiin family of letter groups, which all operate in a very specific way: there are no contexts where ain appears that you wouldn’t also see aiin or even aiiin.

* the ar / or / al / ol group, whose members seem to appear within words in much the same way as each other. The air and aiir letter groups might also be related to this set, though this isn’t not 100% clear. Similarly, -am often seems to me (with a hat tip to Emma May Smith, who discussed -m recently) to be something closer to a combination of ar and hyphen, i.e. that -am at the end of a line often resembles the end of the first half of a word broken in half by the line-ending (and where the second half of the word is at the start of the next line, but disguised with an extra letter inserted before it).

* the -dy and -y word endings, which both seem to be cut from almost exactly the same cloth.

* the e / ee / eee / ch / sh / eo group, which seem to me to function slightly differently between A and B pages.

* the qo group, which almost universally seems to operate as a prefix. In those places where we get l- words, we also get qol a lot: and where l- words don’t appear, we get almost no instances of qol.

Cross all the above instances out, and what remains is a very sharply reduced set of usage groups, such as d- words (in particular daiin, which seems to operate in a mysterious world all of its own), o- words (particularly in front of gallows), and y- words.

What about EVA s?

But if you do do this kind of crossing out, you also won’t find a comfortable place for EVA ‘s’ to go. In fact, to my eyes EVA ‘s’ appears to be the single most anomalous character in the Voynichese alphabet: there’s a strong case to be made that it is the most ‘exposed’ single glyph of all of them, and – by that same token – the one we should spend most time on trying to understand. What I’m saying is that EVA s might well be the weakest link in the Voynichese chain.

If you remember to put aside all the completely different ‘sh’ characters (sharing ‘s’ for both of these glyphs was, in my opinion, a foolish mistake in the design of the EVA transcription scheme, *sigh*), you find that ‘s’ occurs about 1.71% of the time in A pages, and about 1.00% of the time in B pages. If you remove any ‘as’ or ‘os’ pairs (as being probably miscopied or mistranscribed ‘ar’ / ‘or’ pairs) from these stats, these figures go down to 1.34% and 0.83% respectively.

And yet some A pages have numerous s characters (e.g f14r, f15r, f24r), while others have one or fewer s characters (e.g. f14v, f18r, f19r): that this single statistic can differ so much between the two sides of the same folio is something that hasn’t really been noted before, as far as I can recall. [Unless any Lorites out there care to show me the precedent I’ve missed: in one of Friedman’s groups, no doubt.]

All of which incidentally reminds me of something that Glen Claston told me he noticed when he was making his transcription (but which I now can’t find in my email archive, *sigh*): that Voynichese had different clusterings of letter usages that would seem to go into and out of fashion (almost as if one kind of ‘mode’ was active now, and then a different mode active later), sometimes by paragraph, sometimes by page. If this is correct, then perhaps ‘s’ is an active part of some ‘modes’ but not others – just an idea.

What about saiin vs daiin?

I find it interesting that sdaiin occurs only once (on f66r), while sdain, sdaiiin, dsain, dsaiin, and dsaiiin don’t occur at all: yet saiin occurs 144 times.

If s- is some kind of prefix token here, then it seems that so too is d-, and in a way that makes the two avoid stepping on each other’s toes.

My suspicion (for what it’s worth) is that while both work as prefix tokens, they in fact code for two quite different classes of mechanisms: and, moreover, that both prefixes are more meta-linguistic than linguistic in any useful sense.

And what about the first column?

EVA s also has a strong tendency to appear as the first letter of a (non-paragraph-starting) line, particularly in Balneo B pages – but this may possibly be because Balneo B tends to have longer paragraphs than elsewhere.

Combine this (a) with the well-known observation that the first word on each line tends to be slightly longer on average than all the other words on a line, and (b) with Philip Neal’s suggestion that the first letters down some Voynich Manuscript pages might well be a vertical ‘key’ or something similar, and you get an interesting possibility to consider: that line-initial ‘s’ may specifically operate as a null that the writing system needs to prepend to certain (typically short) words.

I was thinking about this today, triggered by a Voynich Ninja forum discussion: I wondered if it might be possible to construct a statistical experiment to test my suggestion that line-initial s- might function as a null character that gets prepended to certain short words (such as aiin).

According to the tentative model I have in mind, the (aiin : daiin) ratio for non-line-initial words should be roughly the same as the (saiin : daiin) ratio for line-initial words. And perhaps it would be good to then repeat broadly the same test for non-line-initial (ar : dar) vs line-initial (sar : dar), etc.

However, I don’t have the right counting tools to do this easily: can anyone please run this test? Thanks!

Hi There! We’re looking for people to write up their theories on cipher mysteries such as the Voynich Manuscript, the Beale Papers and how astroturfed the Tea Party is. You may be surprised to discover that your foolish clickbait opinions could earn you upwards of $0.02 per day, and might even be worth double that (if they are so unbelievably bad that they go viral on Slashdot or Reddit).

To tap your teensy spile into this towering cask of wealth, there’s no need for an office, formal clothes or indeed any clothes beyond your normal tattered rags. Simply compose your posts and comments from the comfort of your own bedsit, surrounded by your piles of old newspapers, unreturned library books, and much-loved microwave meal boxes. Who could ask for a better or more convenient life?

Yes, you too can turn your vapid leaden thoughts into 24 carat Internet gold, just like alchemists and well-known YouTube sock puppet presenters the world over already do. And let’s face it, if Stampy and Squid can make it there, then so can you, right?

Who cares if you haven’t cut your toenails since Dubya left the White House? We don’t! Google values novelty over content, so to become a high-value content creator in this brave new online world, all you have to do is tap into the same rickety stream of consciousness that pushes angry unspoken words in your mouth when you’ve yet again found yourself stuck in the non-moving queue at the supermarket till and type, type, type.

There, now doesn’t that feel better! And how much do we charge you for this “keyboard therapy”? $100? $1000? No, not even close – in fact, we pay you for it. A frighteningly small amount, sure, but let’s not bicker over mere semantics.

How do you get going? Just start your own blog, proclaim yourself an expert on a particular subject (it doesn’t matter what, nobody cares), leave back-linked comments on forums and other people’s blogs, or even – now get this – leave comments on your own blog under false names to make visitors think that there’s some kind of ‘community’ buzz around the nonsense you’re passing off as high-quality thought.

Before long you’ll even be ready to cut-and-paste all your tripe into a 65-page ebook and sell it for $12 a pop. Still think this is all a pipe-dream? No, it’s not! Many thousands of people have jumped about this $$$money$$$ $$$train$$$ already, so why not you as well?

Still believe you’re not right for this extraordinary new world? Think again! You’ll shock yourself when you find out quite how painfully easy it is. Amaze your friends (if you have any, which seems fairly unlikely), take that step, and type, type, type!

I thought I’d share with you the following email I recently received via an anonymous remailing service:

This is being written to you on behalf of a large group of Voynich theorists. Even though we disagree amongst ourselves on everything to do with the Voynich Manuscript itself (which some of us prefer to refer to as the “so-called Voynich so-called Manuscript”), the two things we do all whole-heartedly agree about are (1) how much we despise your pathetic crusade against us, and (2) how much we abhor your ridiculous insistence on primary evidence and testable hypotheses.

Be assured that when one of us does eventually manage to prove definitively that it is a Mongolian shamanic handbook, a heretical medieval suicide manual, or a stranded alien’s diary, the short term pain of finding out that the rest of us was wrong will be amply wiped out by the long term pleasure of mocking you derisively for the rest of your stupid, pointless life.

You just don’t seem to realise that proper ‘Voynich research’ is in no way historical or scientific. Don’t you understand that it is we who established the one basic ‘fact’ of the discourse long ago? The thing that we made true (by repeating it so many times that it became a fact) is that nobody knows anything definite about the Voynich Manuscript. This is the frame of reference everyone is now compelled to use, and neither you, Wikipedia, René so-called Zandbergen, or indeed anyone else can move outside it: howl at the moon all you like, you’ll achieve nothing.

So you’re just wasting your time trying to make (what you conceitedly and falsely like to think of as) ‘progress’. Anything you try to assert, we deny immediately: it’s just physics, stupid. Moreover, anything you can conceive of asserting, we have probably already denied ten times over. Assert/deny, assert/deny, assert/deny: you really bore us.

Look, can’t you get it into your thick head that we theorists pwn the Voynich big-time? The Beinecke may be the institution who owns the Voynich Manuscript, but that means diddly squat against our total pwnage. Why, when there’s no obvious shortage of rent-a-mouth academics out there, do you think Yale struggled so badly to find anyone to write anything remotely sensible in their recent so-called photofacsimile? They were wasting their time swimming against our tide, just like you’re wasting yours.

OK, we’ll admit there was a brief period during which you were marginally useful to us: that was back when having a post in Cipher Mysteries putting down one of our theories was a bit like a badge of honour. We even had special gamified medals produced, to show off which one of us had had the smarmy Cipher Mysteries treatment (how we laughed): but since you’ve stopped doing even that, we’ve all got tired of your meanderings and not-so-funny posts.

So this is just a collective email from us to say goodbye to you. Even though Voynich research is still stalled in the same cul-de-sac it ever was (which is, by our reckoning, is about a perfect a scenario as can be hoped for), we’ve all moved on from you and your stupid blog. You’re yesterday’s man, if not the day-before-yesterday’s man: not interested, la la la.

Why don’t you go research the Phaistos disk or something else unbelievably lame, and leave the Voynich to the people who really own it? Maybe you’ll find some saddo historians out there who want to read your useless drivel: we certainly don’t.

What is the difference between theories and metatheories? Given that the former can sensibly range from hand-wavy general theories (“the Voynich Manuscript was written by a mad alchemist“) to specific theories (“the Voynich Manuscript was written by a young Leonardo da Vinci, using his right hand“), the debate is more whether we can usefully differentiate between metatheories and general theories.

For me, however, the key attribute that distinguishes Voynich metatheories is that they have a certain ‘turn’ to them, a kind of pivoting self-referentiality that their proponents use to explain away just about everything difficult. For example, hoax theorists (such as Gordon Rugg) respond to almost any attempted historical objections (e.g. those surrounding the apparent paradox of using a 16th century mechanism to create an apparently 15th century manuscript) by saying that “well, obviously the hoaxer was so clever that he/she deliberately made those apparently discordant details look that way”.

They then often go on to point out that the more discordant details the hoaxer had to fake, the more obviously brilliant the hoax: and therefore the more we should admire the brilliance both of the hoax and of the man (yes, it’s normally a man) who was clever enough to notice such a brilliant hoax. And so a Voynich metatheory is a thing that arguably focuses more on explaining away that which doesn’t fit than positively accounting for anything it does sort of fit.

Omphalos

It shouldn’t require particularly deep contemplation before you notice more than a flicker of similarity between the structure of this argument and Omphalos creationism, courtesy of the naturalist Philip Henry Gosse in his 1857 book Omphalos.

“Omphalos” is the Greek word for navel: at the time of Gosse’s book, it was widely believed that Adam (in the Garden of Eden) had a navel despite not having come from a mother’s womb. The conclusion that Gosse famously drew from that is that when God made Adam, He made him complete with a navel: an argument that Gosse then triumphantly upscales to all the geological and fossil evidence that superficially seems to argue against the clearly well-proven Biblical History that showed that the Earth was created in 4004 B.C.

God, then, was something like the ultimate hoaxer: for rather than merely hoaxing some ‘ugly duckling’ unreadable book, He actually hoaxed the entirety of time and space to make it look as though the Earth was older than its ‘actual’ age (6021 years or so). As hoaxes go, you’d have to admit that this is top drawer stuff.

Of course, modern creationists have (ironically enough) evolved far more sophisticated arguments than Gosse ever did: but, frankly, I have to say that I’m not wildly interested in either Gosse or them. All that’s important for us here is that Creationism is, similarly, designed far more to explain away that which doesn’t fit the Bible than to explain that which does.

And what holds for Voynich hoax theories broadly goes for other Voynich metatheories focused on explaining all the difficult stuff away: for example, that the Voynich is glossolalia, or channelled, or some kind of otherwise inspired gibberish, or even a shipwrecked alien’s diary (I kid you not, *sigh*). Or even, with more than a half-nod in Stephen Bax’s direction, that Voynichese is composed of the scattered polyglot fragments of so many different languages that we can only recognise a tiny handful of words here and there: all of which anti-linguistic turn is also a metatheory, because it seeks not to explain the few words it grabs but to explain away the 99.9% or more of the other words it fails to account for. Foolishness.

There is, of course, already a large literature on a large field of constructivist mental endeavouring very similar to these metatheories: it is, by another name, pseudoscience. There, the whole point of pseudoscience isn’t to produce theories that can be tested (and possibly disproven), but instead to produce metatheories that are logically impervious to criticism – i.e. that use their central ‘turn’ to invalidate counterarguments.

This also has the effect of making those metatheories impervious to testing, and to refining, and to improving: and thus leaves them far more akin to something handed down in a Very Important Book Indeed. But you knew that already.

In the end, the only thing that separates Voynich metatheories from pseudoscience is that the people putting forward Voynich metatheories tend to be more interested in the postmodernist self-amusement of their ‘turn’ (a kind of awesome wonder that nobody else seems to have noticed how much their metatheory explains away) than in actually engaging with proof or disproof.

And if that’s a good thing, I’m a monkey’s uncle. Or he’s mine. 🙂

I should mention that there’s another André Nageon lurking in a gap in the Nageon de l’Estang timeline (slightly after the others that I covered in parts one to four): and he actually has quite a funky story attached to him. 🙂

André Nageon vs the Monster

There are a number of fleeting mentions of André Ambroise Nageon de l’Estang‘s time in the Seychelles in “Population et vie quotidienne aux Seychelles sous le premier empire” by Joël Eymeret, in “Revue française d’histoire d’outre-mer” Année 1984, Volume 71, Numéro 262, pp. 5-29.

But given that André Ambroise Nageon de l’Estang died in 1798, it must surely be his son about whom a particular anecdote was told. Eymeret repeats the tale, but it actually first appeared in “FRAPPAZ, Les voyages du lieutenant de vaisseau Téophile Frappasz dans les mers des Indes”, texte publié et annoté par Raymond Decary, in-8°, Tananarive, 1939, pp.108-109:

C’est ainsi qu’André Nageon passe dans la légende : Créole de haute stature et d’une force prodigieuse, faisant défricher les terres il y a environ quinze ans [c’est-à-dire en 1803] il s’éloigna un peu des travailleurs pour sonder un marais. A peine eut-il commencé son opération qu’un gros cayman, caché dans les roseaux se dressa sur sa queue, pour s’élancer sur lui. L’apercevoir, deviner son intention et le saisir à bras le corps fut pour l’intrépide M. Nageon l’affaire d’une seconde : et luttant ainsi avec son terrible adversaire, il sut maintenir l’égalité du combat jusqu’à ce que des noirs accourus à ses cris, l’eussent aidé à terrasser le monstre qu’il avait combattu avec tant de courage

…i.e. (my free translation)…

It is thus that André Nageon passes into legend: a tall [white] Creole of prodigious strength, while clearing land there about fifteen years ago [i.e. in 1803] he went a small way away from the other workers to survey a marsh. As soon as he started his work, a big cayman, hidden in the reeds, lifted itself by its tail to jump on him. Noticing it, guessing its intention and wrapping his arms around its body took the intrepid Mr. Nageon no more than a second: and it was in this manner, struggling with his terrible opponent, that he managed to keep it at bay until the blacks, having flocked to his cries, helped defeat the monster he had fought against so bravely.

But this is surely the same André Nageon de l’Estang who is mentioned as selling some land in 1815 on Henri Maurel’s site (through which all manner of genealogical goodness flows):

Le 5 Octobre 1815, Antoine [Maurel] fait l’acquisition de André NAGEON DE l’ETANG de deux parcelles de terrain à Victoria.

And so the Seychellois Nageon de l’Estang family marched forward from there to the modern day, one can only presume. 😉