Where will the first proper Voynich research breakthrough come from? To my mind, there is a good chance that this will be made by someone taking a fresh look at the mystery of the Voynichese ‘languages’.
For even though the notion that Voynichese is a simple, regular language seems to be the default decryption starting point for just about every YouTube codebreaker on the planet (e.g. “it’s obviously proto-Breton with Urdu loanwords“, etc etc), it simply isn’t.
Rather, when you put Voynichese under the linguistic microscope, you see a series of different (but closely related) languages / writing systems. And whatever you think Voynichese is, having to account for multiple variants of that thing is bemusing, if not downright perplexing.
The most fundamental challenge, then, that these variants present us is this: can we work out how these variants relate to each other? Furthermore, can work out how a letter / word / sentence written in one variant would be written in another? In short, can we somehow normalize all the Voynich Manuscript’s languages relative to each other, to step towards a single, regular system underlying them all?
For me, reaching even part of the way towards doing this would be perhaps the most significant Voynich research achievement yet.
The ‘Language’ Landscape…
It was top American cryptologist Captain Prescott Currier back in the 1970s who first inferred the presence of multiple Voynichese ‘languages’. He famously categorised Voynichese pages as having been written in either an ‘A’ language variant (now known as “Currier A”) or a ‘B’ language variant (A.K.A. “Currier B”). This was motivated by various statistical features of the text that he observed clustering together in A pages and B pages respectively. What is more, Currier’s A/B clustering largely holds true not only for both the pages on any given folio (i.e. recto and verso), but also for all the folios / panels on a single bifolio (or trifolio, etc).
Though Currier’s A/B division is a very useful categorisation tool, it remains somewhat problematic as an absolute measure, for (as Rene Zandbergen likes to point out) a few intermediate pages have both Currier A and Currier B features simultaneously. Rene points especially to the foldout folios for this: he says that Currier’s initial assessment was drawn from the herbal pages (which I think is very probably true), and that these super-wide pages behave a little differently.
Moreover, the variations of the languages used in different sections (e.g. “Herbal A”) present yet further dialect-like differences to be accounted for. Inferring from this that these differences ‘must therefore’ relate to the pages’ semantic content would be a convenient way of explaining them away: but there is as yet no evidence to support that conclusion. For now, these section clusters need to be handled with statistical white gloves too.
We additionally have codicological evidence that suggests that some sections of the manuscript were originally formed of pairs of gatherings (e.g. Q13 was Q13A + Q13B, Q20 was Q20A + Q20B), but nobody (as far as I know) has as yet gone looking for Voynichese text statistics that might support or refute these proposed divisions.
And on top of that, there is what has come to be called as ‘labelese’, i.e. the disjointed one-word-at-a-time text found on pages with ‘labels’ attached to parts of diagrams (e.g. the Astrological / Zodiac section). Here again, some people like to infer that it ‘must somehow be’ the semantic content of these labels that affects the way Voynichese works: but there is no evidence to support that conclusion, beyond wanting it to be true for an easy life. 😉
In summary, what we observe in Voynichese is a lot of language-like variation going on at a number of levels. In my opinion, we should stop trying to explain away these variations in terms of speculative concepts (e.g. ‘semantic differences’ or ‘labels’), and start instead to look at the basic statistical patterns that each text cluster presents, and use those results as our starting point moving forward.
Unsurprisingly, this is what the next section does. 🙂
A/B Observations…
It’s worth reprising Currier’s observations (which we will turn into actual statistical evidence shortly). He wrote (transcribed on Rene’s site):
(a) Final ‘dy’ is very high in Language ‘B’; almost non-existent in Language ‘A.’
(b) The symbol groups ‘chol’ and ‘chor’ are very high in ‘A’ and often occur repeated; low in ‘B’.
(c) The symbol groups ‘chain’ and ‘chaiin’ rarely occur in ‘B’; medium frequency in ‘A.’
(d) Initial ‘chot’ high in ‘A’; rare in ‘B.’
(e) Initial ‘cTh’ very high in ‘A’; very low in ‘B.’
(f) ‘Unattached’ finals scattered throughout Language ‘B’ texts in considerable profusion; generally much less noticeable in Language ‘A.’
Rene Zandbergen adds the following observations:
The very frequent character combination ‘ed’ is almost entirely non-existent in all A-language pages.
The very common character combination ‘qo’ is almost completely absent in the zodiac pages and the rosettes page, but appears everywhere else.
The common character combination ‘cho’ does not appear in the biological pages (and the rosettes page), but it does in other B-language pages.
Marco Ponzi further added:
The ‘cluster’ aiin has more or less the same frequency in A and B, but as a stand-alone word it is about three times as frequent in B than in A.
Prescott Currier also noted a number of striking language oddities in the ‘Biological B’ section:
This ‘word-final effect’ first became evident in a study of the Biol. B index wherein it was noted that the final symbol of ‘words’ preceding ‘words’ with an initial ‘qo’ was restricted pretty largely to ‘y’; and that initial ‘ch, Sh’ was preceded much more frequently than expected by finals of the ‘iin’ series and the ‘l’ series. Additionally, ‘words’ with initial ‘ch, Sh’ occur in line-initial position far less frequently than expected, which perhaps might be construed as being preceded by an ‘initial nil.’
This phenomenon occurs in other sections of the Manuscript, especially in those ‘written’ in Language B, but in no case with quite the same definity as in Biological B. Language A texts are fairly close to expected in this respect.
My own contribution to this line of inquiry has been to point out that word-initial ‘l-‘ is a very strong feature of B pages (particularly Q13). Emma May Smith similarly posted on the various “l + gallows” digraphs:
It should also be noted that <lk> is mostly a feature of the Currier B language. It is roughly twenty times less common on A pages than B pages.
The presence of digraphs composed of <l> and other gallows characters is less secure. The string <lt> occurs 107 times, <lp> occurs 40 times, and <lf> occurs 39 times. Although <lf> is the least of the three its rate is actually rather great, being nearly 8% of all <f> occurrences, approaching the 10% for <lk> of all <k>. Even so, these number are still small and could easily be overlooked if not for <lk>.
Like <lk>, <lt, lp, lf> all appear at the beginning of words, and mostly occur in Currier B. They seem to work in the same way, even if less common.
All in all, it seems to me that there are probably more than twenty Voynichese features that display a statistically significant difference between Currier A pages and Currier B pages. It also seems that many of these features have different relative frequencies between different clusters (e.g. Herbal A) and/or sections (e.g. Q13).
There is therefore plenty of work to be done here!
List of Distinctive Behaviours
Even though we have excellent transcriptions (EVA and otherwise), I think we’re collectively missing a foundational piece of Post-Currier empirical analysis here: a list of distinctive behaviours present and absent in sections of the Voynich Manuscript. This would extend Prescott Currier’s list to include many more features (such as the use of the EVA ‘x’ glyph, etc) that have been flagged up as distinctive in some way by researchers over the years, though with less of a pure A/B focus. Here is a preliminary list (based largely on the above), which I’m more than happy to extend with additional ones put forward in comments here or elsewhere:
-dy | B |
[chol] | A |
[chor] | A |
[chol.chol] | A |
[chor.chor] | A |
[chain] | A |
[chain] | A |
chot- | A |
cth- | A |
ed | B |
[ar] | B |
qo- | Absent in rosette and zodiac pages |
cho | Absent in rosette and Bio pages |
cho* | Rare in Q13 |
[aiin] | Common in B as standalone word |
l- | B |
r- | B, particularly Q13 and Q20 |
lk | B |
lt- | B |
lp- | B |
lf- | B |
aly | f58 |
x | Q20 |
-m not at line-end | Bifolios f3-f6 & f17-f24 |
My core beliefs here are (a) that Voynichese will turn out to be fundamentally rational (if perhaps a bit strange); (b) that behaviours in one section will somehow rationally map to behaviours in many (if not all) different sections; and (c) that Voynichese will turn out to have an underlying story / evolution / growth path that we can reconstruct.