On the one hand, I’ve spent years trying to reconstruct the “inner history” of the Voynich Manuscript: while on the other, I’ve spent the same period trying to deconstruct the subtle fault-lines in its cipher system. History and science: the ultimate epistemological pincer attack, if you will.
In that general vein, here’s a new research angle on the Voynich Manuscript’s cipher to think about, that’s supported on both sides by very specific art historical reasoning and statistical reasoning.
Firstly, the art history. Quite independently of the question of authorship, I recently argued (in my post on Voynich Q13) that Q13b (baths) is to Q13a (something disguised as baths) as Herbal-A (agriculture) is to Herbal-B (something disguised as agriculture). This same relationship may well hold true for the ‘pharmacological’ pages, in that Q19 pharma (visual recipes) may bear the same relation to some or all of Q15 pharma (something disguised as visual recipes).
I have also argued on codicological grounds that the patterns resembling aiiv and aiir we see densely scattered throughout the VMs were intended to resemble medieval folio references, while concealing some other information (probably Arabic numbers). My hypothesis is that this steganography was initially achieved (early in the Currier-A phase) by placing dots over the various aiiv instances, but that the author then decided this was too obvious and so went through the text adding scribal flourishes connecting the right-hand edge of the v-shape to the flying dot. However, by the time of the Currier-B phase, the same aiiv pattern was used as a covertext, but a different kind of steganography was used for the concealed text – here, the overall shape of the final “v” letter seems to have been used as the enciphering mechanism. What I like about this is that it should be able to be tested by a careful spectroscopic scan of the aiiv instances. I suspect that it will be amazing how much you can tell from the evolution of a single pattern across the VMs’ pages.
Put all this together, and what I think emerges is a picture of a cipher system that is evolving across multiple phases – the Currier-A dot phase, perhaps a Currier-A pure loop phase, a Currier-B v-shape phase. Glen Claston has his own ideas on the evolution and gestation of the pages (along broadly similar lines), so this isn’t really massive news on its own.
Secondly, the statistics. Since Prescott Currier proposed his two-language (Currier A and Currier B) model in 1976, it is sadly true that far more people have picked up on what this “split” might imply than have tried to actually statistically analyze it in a deeper way. What are those differences, though?
- -dy: rare in A, very common in B
- chol-, chor-, and chot-: very common in A, rare in B
- cth-: common in A, rare in B
- chain, chaiin: medium frequency in A, rare in B
To which I would add that qol- occurs 20x more often in B than in A, and that if you remove all ol and al pairs, the remaining freestanding ls occur 8x more often in B than in A.
All of which leads to this basic observation: currently, I think that the very best explanation of why the ‘formation rules’ of Currier B differ from the formation rules of Currier A is that I believe that the Voynich’s author evolved the system from A to B not to accommodate another language or dialect, but rather to hide perceived weaknesses in the Currier A cipher system.
This then suggests a new cryptological research angle: if we can statistically identify what specific patterns were removed from Currier A during the transition to Currier B (and perhaps even identify matching patterns that were added to Currier B), then we might, with a little luck, start to work out why the author thought they were weaknesses in the cipher system.
As an example, could it be that many of the instances of ch (or, more likely, cho)in Currier A reappear as freestanding l in Currier B? If so, why did the author evolve Voynichese in this way? Was he hiding a weakness in the cipher system? Did the author judge that the first phase’s cho was unnecessarily verbose, and so came to replace it by the (much more compact) freestanding l in later phases?
…Q13b (baths) is to Q13a (something disguised as baths) as Herbal-A (agriculture) is to Herbal-B (something disguised as agriculture).
So do you think there’s actual content about herbs and baths in Herbal A and Q13B respectively, or are they the visual equivalent of null characters in a ciphertext, intended to distract from the meaningful content of the other sections?
The VMs has never given me the impression of having a single null character, or a single null drawing.
I merely think that the Q13a and Herbal-B pages were added in a different composition phase.
Nice post Nick,
It is certainly a vital step to analyse what the actual differences are between the pages that Currier identified as ‘A’ from those as ‘B’, and I think you’re right that not many people take the time and effort to do that for themselves.
Most are probably happy to accept that someone else found a difference, and to settle for a big assumption or premature conclusion as to the reason for those differences.
Whether that assumption is that there is a different underlying plaintext language, or a different content type, or a different system at work.
The analysis of ‘A’ and ‘B’ is just a step, and having entertained the idea that not all the pages may be using the same system it would also be a bit remiss to stop at the set of pages flagged as Currier-A or Currier-B.
The question then needs to be asked, “are the pages within these sections statistically consistent with each other, or are there also differences within A and within B just as significant as those between A and B?”.
Assuming uniformity of the whole VMs is undoubtedly a BIG mistake, but I believe the categorization into ‘A’ and ‘B’ may be an oversimplistic model and misrepresentation of the variation that exists across all pages which if we are not careful will mislead us into wrong thought about the process which creates the pages, and how it changes from one page to another.
There is nothing like statistics (and especially averages) to take a reality rich in detail and variation and in collapsing a multi-dimensional space to a single number produces something essentially meaningless.
(like the average human being having 1 breast and one testical and what is the colour of the average fruit? 🙂
In many cases looking at the proportion of symbols on Voynichese pages doesn’t produce a neatly polarized picture of two systems or varients at work, but something more subtle and complex with far more dimensions and degrees of freedom.
If we want to entertain there being “two systems” it is more like each page has a blend of influences from those two systems. But even that may be a proposal that is rigid and inflexible compared to the real Voynichese.
Perhaps there are eight systems and each page is a blend of those? 🙂
Suppose there were 8 system varients which differed and overlapped in subtle ways and the rules for blending them on a given page were subtle enough – it would be very hard to see where the “joins” were, and even more hard to separate and determine the details of each individual system, and work out which one was at work at what point – as you would need to do in order to decode the thing.
Marke
Hi Marke,
True, true – but I think that it is reasonably safe to say that the well-known stats I mentioned (illustrating many of the main differences between Currier A and Currier B) do indicate some kind of change in the underlying system, and that a reasonable hypothesis is that C-A “cho” was largely (but not entirely) replaced by C-B “l”. That, at least, is testable: and may go some way to explaining what is going on.
If you talk with Glen Claston for any period of time, he’ll infect you with the idea that Voynichese did evolve in many more subtle ways than just the C-A/C-B barrier – and I think he’s probably right. Even so, perhaps it’s time that the C-A/C-B change was revisited in a more substantive way…
Cheers, ….Nick Pelling….
Excellent post, Nick! I don’t have time for detailed consideration, unfortunately. My old thought was that the Voynichese system is a homophonic system, and that A simply represents different usage choices of alternatives from B. I did think that comparing A and B would be key to cracking the overall system.
Comparing how different operators used homophonic systems was the key to breaking them in the past, of course. I’m not aware of the exact details.
René and others have done more detailed cluster analyses of the VMs which break it down further than just A and B, of course. René did about the most extensive, AIRC.
I think the issue here is about identifying the precise mechanisms by which C-A and C-B differ – and to make the analytical assumption that the statistics of the underlying A & B plaintexts remain broadly similar. If we can specifically identify the mapping from the A system to the B system (such as “cho maps to l”), then we should be able to make progress in identifying what kind of a beast the plaintext is.
Not too much to ask for, is it? 😉
I strongly support the idea in post 6 above, namely
that pinpointing the difference between C-A and C-B
will tell us something valuable.
I second René’s last comment! Really what I meant in mine.