In case you’ve arrived late to the linguistics party, abjad is a term used to describe a writing style for a language (primarily) made up from consonants, where the reader is required to fill in the unwritten vowelled gaps for himself/herself. Perhaps the best-known example of this is the modern Arabic script, from the first four letters of whose alphabet the term “abjad” comes – in fact, it’s the Arabic word for “alphabet”.

So… might Voynichese be written in an abjad writing style?

Freelance systems analyst Joachim Dathe thinks so: inspired initially by the apparent similarity between the Voynich Manuscript’s (occasionally ornate) script and Arabic calligraphy, for the last few years he has been promoting and refining his theory that Voynichese is nothing more than Arabic written in an apparently unique (and rather idiosyncratic) abjad stylee.

Yet at the same time, Dathe also believes that the Voynich’s Arabic plaintext can only be extracted with difficulty, because in his particular Arabic reading of it:-
* Punctuation is absent
* Sentence structure isn’t at all obvious
* Word boundaries are often inexact or missing
* Spaces are often inserted inside words
* “Words often appear […arranged or ordered…] in a way which is not compliant with the Arabic language
His overall conclusion: “Obviously, the texts were dictated to a writer who did not master Arabic scripts.

For example, Dathe and his translator collaborator admit that their transliteration of the start of f1r yields a fairly jumbled (if not actually random) set of Arabic words, and offers the following interpretative translation of it (though naturally only one of many possible):-

A dervish continues to Elate, believing that he is forgotten, and when I am surrounded by his presence, I am in Eden. I am a naught in his life. When despaired of Iman Taha (the faith of The Prophet Peace be upon him), he was purified by an illusion, this is what my faith has inspired me yesterday. I see it distantly in the image of my mother. Do we blame he who offered his life? If you deny him you pierce my eyes, and if you embrace him your excuse will be realized.

Now, claiming a Voynichese abjad decryption that proves unrelated to the drawings and imagery (in Dathe’s case, of “religious content from Sufism”) isn’t unique: John Stojko’s (in)famous vowel-free proto-Ukrainian Voynich decryption of f18r – “What slanted Oko is doing now? Perhaps Ora’s people you are snatching. I was, I am fighting and told the truth. Oko you are fighting mischievously (evil manner). Ask this. Are you asking religion for your clan?” – springs to mind.

Of course, this comparison is hardly breaking news: Elmar Vogt noted much the same similarity in 2012, though going on to compare both sets of mangled-sounding plaintexts with Vogon poetry was perhaps a teensy bit harsh. Still, I do find it hard to disagree with Elmar’s sentiment that Dathe’s “approach is flagrantly naïve”: if there is a real, tangible difference between the way Stojko and Dathe both approached Voynichese, I certainly can’t see it. And if one is wrong for that reason, then so surely is the other.

(Remember: the long-established template for bad Voynich theories is (a) to conjure up a simple-sounding explanation, and then (b) to wrap that up in a long series of what are known as “saving hypotheses” – additional weasel-like meta-explanations that serve to explain away conflicts between that wonky core explanation and an inevitably long succession of inconvenient historical truths. Voynich theorists like to think of themselves as following in the giant decrypting footsteps of Young, Champollion, Ventris et al: but none of that august list put forward theories that needed extensive sets of saving hypotheses to explain away contingent problems.)

In many ways, though, simply grabbing hold of a given abjad script (whether Arabic or vowel-less proto-Ukrainian, if such a thing ever genuinely existed) as a starting point for decrypting the Voynich is without much doubt a poor way to proceed. The proper first question is instead this: what is the linguistic evidence that Voynichese is a script that has no vowels?

Linguists have long exercised their cunning (if you’ll excuse the reordered juxtaposition) by running text corpora through consonant-vowel analysis programmes: basically, they’re looking for hidden Markov models (HMM) with a small number of vowels that constantly recur without leaving consonants adrift in blocks (known as CVCV structure).

Reddy and Knight reported:-

[Jacques] Guy (1991) applies the vowel-consonant separation algorithm of (Sukhotin, 1962) on two pages of the Biological section, and finds that four characters (O, A, C, G) separate out as vowels. However, the separation is not very strong, and several words do not contain these characters.

At the same time, when they ran their own 2-state bigram HMM programme on Voynichese, the only feature they noted was the strong binding between the final letter of words (typically EVA ‘y’) and the space following it: which model they thought similar to Arabic script. So… it is Arabic, then?

Well… no. What this actually means is that a 2-state bigram HMM is woefully inadequate for analysing EVA-transcribed text. Essentially, EVA is a stroke transcription rather than a glyph transcription (hence many composite shapes are transcribed in two or three strokes): and so should never be used as the “raw” input to a statistical analysis programme. So they wasted their time using a 2-state bigram HMM: not even close. (Even if they didn’t use EVA, I would argue that a 2-state bigram HMM is thoroughly unsatisfactory for numerous other reasons, most of them connected with the behaviour of the EVA letters ‘a’, ‘e’, ‘i’, and ‘o’.)

In fact, arguably the fundamental statistical paradox about Voynichese as a script is that while it is riddled (quite literally, I suppose) with multiple overlapping internal structures, analysts have had very little luck building up Markov models to describe its behaviour; all of which is really quite the opposite of how you’d expect a well-formed language’s script to present. Even Jorge Stolfi’s long-standing “crust-mantle-core” model falls well short of being properly explanatory about the text. So, if Kevin Knight wants something Voynichian for his 2014 summer interns to get their teeth into, surely building up properly substantial Markov models for Currier A and Currier B (oh, and labelese too) would be an excellent starting point. Sort that out and we should all be sharing turkey and pepperoni pizza by Thanksgiving. 🙂

Jacques Guy applied Sukhotin’s algorithm to a glyph transcription, and so stood a better chance of getting sensible results than Reddy and Knight: yet I think the patterns in the text tell us a very much more complicated historical story than is captured by either of these two analytical tracks.

On the one hand, I think it is plain as day that we (the Voynich Manuscript’s ‘audience’, so to speak) are supposed to ‘read’ Voynichese in part as if it were a CVCV structured (non-abjad) thing. Look at the Pisces labels: these not only have a strong CVCV structure, but 25 out of the 30 also begin with the letter ‘o’ (presumably followed by a consonant, usually a ‘t’ or ‘k’ gallows character):-

otalal / otaral / otalar / otalam / dolaram / okaram / oteosal / salols / okaldal / ykolaiin / sar.am / oty / oky.ody / oty.or / okaly / otody / otald / otal.dar / okody / opys.am / chckhhy / otaly / otal.rar / otal.dy / okeoly / okydy / okees / otalalg / okasy / otar

There is also the heavy repetition of ‘or’, ‘ar’, ‘ol’ and ‘al’ throughout the text to consider, especially in phrases such as “or oro ror”. Once you visually ‘tune in’ to this kind of pairing, I think it becomes hard not to see the text as largely CVCV structured.

On the other hand, I think it is very nearly as plain that there’s something terribly wrong with this CVCV model of Voynichese. The simplest objection is that if it is correct, then only ‘o’ and ‘a’ seem to participate in CVCV structured words, making Voynichese a vowelled language with only two genuinely combinable vowels. Which would be a nonsense, right?

So if you think the Voynichese script is directly expressing an actual natural language, you’re stuck halfway between two extrema, because it’s neither consonanty enough to be an abjad (unvowelled) script, nor vowelly enough to be a proper abugida (vowelled) script. It’s a paradox, right?

Hence I personally think the only sensible conclusion is that Voynichese is a script that is neither an abjad nor an abugida, but is instead a covertext designed to resemble a plausible-looking language script (albeitone with too few vowels to register solidly as either category). The cryptographic truth falls between these either-or categorical boundaries erected by linguists, and in a much more subtle and devious way than linguists’ tools are able to handle comfortably. Good isn’t it?

Indeed, “There are more things in heaven and earth, Horatio / Than are dreamt of in your philosophy.