Here’s a second paper suggestion for the virtual Voynich conference being held later this year: this focuses on creatively visualising the differences between Currier A and Currier B.
A vs B, what?
“Currier A” and “Currier B” are the names Voynich researchers use to denote the main two categories of Voynichese text, in honour of Prescott Currier, the WWII American codebreaker who first made the distinction between the two visible in the 1970s.
Currier himself called the two types of Voynichese “A” and “B”, and described them as “languages”, even though he was aware some people might well misinterpret the term. (Spoiler alert: yes, many people did.) He didn’t do this with a specific theory about the manuscript’s text: it’s essentially an observation that the text on different pages work in very different ways.
Crucially, he identified a series of Voynich glyph groupings that appeared in one “language” but not the other: thanks to the availability of transcriptions, further research in the half century since has identified numerous other patterns and textual behaviours that Currier himself would agree are A/B “tells”.
Interesting vs Insightful
But… this is kind of missing the point of what Voynich researchers should be trying to do. The observation that A and B differ is certainly interesting, but it’s not really insightful: by which I mean the fact that there is a difference doesn’t cast much of a light on what kind of difference that difference is.
For example, if A and B are (say) dialects of the same underlying language (as many people simply believe without proof – though to be fair, the two do share many, many features), then we should really be able to find a way to map between the two. Yet when I tried to do this, I had no obvious luck.
Similarly, if A and B are expressions of entirely different (plaintext) languages, the two should really not have so many glyph structures in common. Yet they plainly do.
Complicating things further is the fact that A and B themselves are simplications of a much more nuanced position. Rene Zandbergen has suggested that there seem to be a number of intermediate stages between “pure” A and “pure” B, which has been taken by some as evidence that the Voynich writing system “evolved” over time. Glen Claston (Tim Rayhel) was adamant that he could largely reconstruct the order of the pages based on the development of the writing system (basically, as it morphed from A to B).
Others have suggested yet more nuanced accounts: for example, I proposed in “The Curse of the Voynich” (2006) that part of the Voynichese writing system might well use a “verbose cipher” mechanism, where groups of glyphs (such as EVA ol / or / al / or / aiin / qo / ee / eee / etc) encipher single letters in the plaintext. This would imply that many of the glyph structures shared between A & B are simply artifacts of what cryptologists call the “covertext”: and hence if we want to look at the differences between A and B in a meaningful way, we would have to specifically look beneath the covertext – something which I suspect few Voynich researchers have traditionally done.
Types of Account
As a result, the A/B division sits atop many types of account for the nature of what A and B share, e.g.
- a shared language
- a shared linguistic heritage
- a shared verbose cipher, etc
It also rest upon many different accounts of what A and B ultimately are, e.g.:
- two related lost / private languages
- a single evolving orthography wrapped around a lost / private language
- a single evolving language
- a single evolving shorthand / cipher system, etc
The difficulty with all of these accounts is that they are often held more for ideological or quasi-religious reasons (i.e. as points of faith, or as assumed start-points) than as “strong hypotheses weakly held”. The uncomfortable truth is that, as far as I know, nobody has yet tried to map out the chains of logical argumentation that move forwards from observational evidence / data to these accounts. Researchers almost always move in the reverse direction, i.e. from account to the evidence, rather than from evidence to explanation.
And when the primary mode of debate is arguing backwards, nobody normally gets anywhere. This seems to be a long-standing difficulty with cipher mysteries (particularly when treasure hunters get involved).
EVA as a Research Template
If Voynich researchers are so heavily invested in a given type of account (e.g. Baxian linguistic accounts, autocopying accounts, etc), how can we ever make progress? Fortunately, we do have a workable template in the success of EVA.
The problem researchers faced was that, historically, different transcriptions of the Voynich were built on very specific readings of Voynichese: the transcriber’s assumptions about how Voynichese worked became necessarily embedded in their transcription. If you were then trying to work with that transcription but disagreed with the transcriber’s assumptions, it would be very frustrating indeed.
EVA was instead designed as a stroke-based alphabet, to try to capture what was on the page without first imposing a heavy-duty model of how it ought to work on top of it. Though EVA too had problems (some more annoying than others), it provided a great way for researchers to collaborate about Voynichese despite their ideological differences about how the Voynichese strokes should be parsed.
With the A/B division, the key component that seems to be missing is a collaborative way of talking about the functional differences between A and B. And so I think the challenge boils down to this: how can we talk about the functional differences between Currier A and Currier B while remaining account-neutral?
Visualising the Differences
To my mind, the primary thing that seems to be missing is a way of visualising the functional differences between A and B. Various types of visualisation strategies suggest themselves:
- Contact tables (e.g. which glyph follows which other glyph), both for normal parsing styles and for verbose parsing groupings – this is a centuries-old codebreaking hack
- Model dramatisation (e.g. internal word structure model diagrams, showing the transition probabilities between parsed glyphs or parsed groups of glyphs)
- Category dramatisation (e.g. highlighting text according to its “A-ness” or its “B-ness”)
My suspicion has long been that ‘raw’ glyph contact tables will probably not prove very helpful: this is because these would not show any difference between “qo-” contacts and “o-” contacts (because they both seem like “o-” to contact tables). So even if you don’t “buy in” to a full-on verbose cipher layer, I expect you would need some kind of glyph pre-grouping for contact tables to not get lost in the noise.
You can use whatever visualisation strategies / techniques you like: but bear in mind the kind of things we would collectively like to take away from this visualisation:
- How can someone who doesn’t grasp all the nuances of Voynichese ‘get’ A-ness and B-ness?
- How do A-ness and B-ness “flow” into each other / evolve?
- Are there sections of B that are still basically A?
- How similar are “common section A” pages to “common section B” pages?
- Is there any relationship between A-ness / B-ness and the different scribal hands? etc
Problems to Overcome
There are a number of technical hurdles that need jumping over before you can design a proper analysis:
- Possibilism
- Normalising A vs B
- First glyphs on lines
- Working with spaces
- Corpus choice
Historically, too much argumentation has gone into “possibilism”, i.e. considering a glyph pattern to be “shared” because it appears at least once in both A and B: but if a given pattern occurs (say) ten times more often in B than A, then the fact that it appears at all in A would be particularly weak evidence that it is sharing the same thing in both A and B. In fact, I’m sure that there are plenty of statistical disparities between A and B to work with: so it would be unwise to limit any study purely to features that appear in one but not the other.
There is also a problem with normalising A text with B text. Even though there seems to be a significant band of common ground between the two, a small number of high-frequency common words might be distorting the overall statistics, e.g. EVA daiin / chol / chor in A pages and EVA qokey / qokeey / qol in B pages. I suspect that these (or groups similar to them) would need to be removed (or their effect reduced) in order to normalise the two sets of statistics to better identify their common ground.
Note that I am deeply suspicious of statistics that rely on the first glyph of each line. For example, even though EVA daiin appears in both A and B pages, there are some B pages where it appears primarily as the first word on different lines (e.g. f103v, f108v, f113v, all in Q20). So I think there is good reason to suspect that the first letter of all lines is (in some not-yet-properly-defined way) unreliable and should not be used to contribute to overall statistics. (Dealing properly with that would require a paper on its own… to be covered in a separate post).
Working with spaces (specifically half-spaces) is a problem: because of ambiguities in the text (which may be deliberate, from scribal arbitrariness, from transcriber arbitrariness, etc), Voynich transcription is far from an exact science. My suggested mitigation would be to avoid working with sections that have uncertain spacing and labels.
Finally: because of labelese, astro labels and pharma labels, corpus choice is also problematic. Personally, I would recommend limiting analysis of A pages to Herbal A only, and B pages to Q13 and Q20 (and preferably keeping those separate). There is probably as much to be learnt from analysing the differences between Q13’s B text and Q20’s B text as from the net differences between A and B.
Nick, thank you for regularly reminding us of the difference between the two languages, it’s very useful.
Can you please recommend an article with results of the statistical analysis of the B language?
Nick,
I see that my question remains unanswered, it was badly formulated?
Ruby: there are numerous pages and Voynich Ninja threads which cover different aspects / behaviours of Currier B, plus summary presentations (like Kevin Knight’s) that try to give a top-down view, but I don’t know of anything in the sweet spot between the two that you seem to be hoping for.
What I’m getting at here (and on the #4 suggestion page) is that I’m still unsure whether there’s more or less difference between A and B than you might think.
Thanks NIck, I’ll read Kevin Knight again. It’s just the term ” numerous pages ” that slows me down, I don’t feel brave enough to learn the statistics.