To my mind, there are two basic types of Voynich Manuscript researchers: (a) those who view Voynichese as a language composed of clearly legible individual letters (and who therefore tend to treat it either as a confounding linguistic puzzle or as an exercise in pure cryptology); and (b) those who believe that you would first need to work out how to parse groups of glyphs into tokens before you can even begin to make any sense of the text.
Despite having made the case for (b) back in “The Curse of the Voynich” (2006), I don’t honestly believe that this second group’s camp has ever had more than my tent in it. (An occasional marauding bear, perhaps, but that’s about it as far as it goes, company-wise.)
Why is “Camp B” so empty?
The argument starts with the difference between strongly-linked glyph pairs and weakly-linked glyph pairs.
In Voynichese, EVA ‘q’ is almost always followed by EVA ‘o’ (5186 times, compared with about 120 for all other occurrences of ‘q’). The strength of this link suggests the presence of an underlying orthographic rule (i.e. “q is always followed by u”), and also that a fair few of the other (non-qo) instances may well prove to be copying slips.
Similarly, if we see the first half of a strike-through ‘ch’ character (i.e. ‘c’) in front of a gallows character, it is almost always matched by the second half of a strike-through ‘ch’ character (i.e. ‘h’). This too suggests that c+gallows+h is following some kind of underlying orthographic rule:
* cth 905:33
* ckh 876:26
* cph 212:6
* cfh 73:6
However, it then turns out that Voynichese is full of families of strongly-linked glyph pairs, and that (though I don’t have precise statistical evidence for asserting it) it is these strong links that drive much of the structure and statistical behaviour of Voynichese.
* ‘ol’, ‘al’, ‘or’, ‘ar’
* ‘ee’, ‘eee’, ‘eeee’
* ‘aiv’, ‘aiiv’, ‘aiiiv’
* ‘air’, ‘aiir’, ‘aiiir’
* ‘ok’, ‘ot’, ‘op’, ‘oh’
* ‘dy’ (though I suspect dy works in a different way to the others)
That is, the amount of genuine information inside these groups is very small: which conversely, in my opinion, means that we should not be trying to look for information inside these groups at all. The real information in the text lies in the choice between these strongly groups, not inside each strongly-linked group.
Reading Jelly vs Parsing Foam
As a result, when I look at Voynichese words such as ‘olchedy’ and ‘olcheey’ (which occur a respectable 71 and 17 times respectively), I can only sensibly parse them as “ol-ch-e-dy” and “ol-ch-ee-y” before even beginning to try to make sense of what is going on with them. And even once you have parsed them, they remain just as inscrutable as before.
All of which is to say that I think we cannot yet parse Voynichese reliably, which is the starting point for the single-tent Camp B described at the top of the post. Yet this does not mean that all is lost: it just means that we are still trying to find a reliable and strong way to get started on a difficult road.
But linguistically, this isn’t how languages work. Orthography is driven by issues such as consonance and assonance: but what we appear to be seeing here is more like a jelly of letters (i.e. more structured than soup, but still quite plastic), joined together into words by deeper rules we are still unaware of.
Yet perhaps a more useful (and visual) way of viewing Voynichese is as a ‘foam’ of small glyph-group bubbles, (e.g. ‘ol’, ‘qo’, etc), empty of meaning in the middle but with all the semantic content on their outside at the point where they touch other bubbles. What I’m trying to do is to decompose the foam of words into its constituent bubbles.