More than 30 years ago, ex-US military codebreaker Prescott Currier was looking at the Voynich Manuscript, when he noticed not only that the handwriting changed (though he was uncertain how many different scribes were involved), but also that the language itself (or, more precisely, the rules governing how Voynichese letters meshed with each other) changed. He called the two major Voynichese ‘dialects’ thus identified “A” and “B” (though it turns out that quite a few pages are subtly intermediate between A and B).
Hence one large shadow hanging over any discussion of Voynichese is the issue of why such a clearly constructed language / system as Currier A (which was almost certainly written before Currier B) needed to be modified to make Currier B. After all, as Jerry Pournelle used to say every couple of months in Byte magazine, “if it ain’t broke, don’t fix it“, surely?
And yet it seems that the Voynich’s author did fix it: so, might the presence of statistical differences be a clue that Currier A was in some way broken? To me, this implies that we should try to quantify and model the differences between A and B pages, so that we can see what aspects of A were modified to make B pages, just in case this exposes some subtle weakness of the A language. Basically, what flaws in the A language were the A→B hacks trying to cover up?
As part of this whole process, I’ve recently been looking closely at the ‘l’ character in EVA transcriptions of the Voynich Manuscript, and what the different treatment of ‘l’ characters on A and B pages might be able to tell us. It’s well-known that ‘l’ is very commonly preceded both by ‘o’ and by ‘a’ – but does this behaviour change much between A pages and B pages?
According to my online Javascript analysis tool:-
- In A pages, ‘l’ is preceded by ‘o’ 72.7% of the time, and is preceded by ‘a’ 22.9% of the time.
- In B pages, ‘l’ is preceded by ‘o’ 43.7% of the time, and is preceded by ‘a’ 29.0% of the time.
- Freestanding ‘l’ (i.e. ‘l’s not preceded by ‘a’ or ‘o’) occur 118 times in A pages, but 1706 times in B pages.
- ‘ol’ usually appears preceded by a space (97% of the time in A pages, 96% of the time in B pages)
- Freestanding ‘l’ usually appears preceded by a space (90% in A, 95% in B).
- The summed counts for ‘ol’ and freestanding ‘l’ remains roughly the same (5.1% in A, 4.7% in B)
What is most interesting about this to me is that it seems to be saying that ‘ol’ and freestanding ‘l’ function in very similar ways, but in the transition from A to B, freestanding ‘l’ seems to have replaced ‘ol’ in about 37.5% of cases. That is, it seems to me that ‘ol’ and ‘l’ (when not preceded by ‘a’) might well represent exactly the same token: which is to say that, al’s aside, ol = l.
So, according to my current forensic reconstruction, ol and al were verbose tokens in the A pages, but because ol appeared so often (4.57%) in A pages (thus bloating the size of the ciphertext), the author finessed this in B pages. By replacing many ol’s with l, ol’s percentage went down to 2.67% while freestanding l went up to 1.66% in B (relative to 0.27% in A).
I’m pretty sure that Glen Claston’s concern about the bloating effect of verbose cipher was shared by the VMs’ author, and that at least some of the changes between A and B were done in order to tighten up the output. Why else fix it if it wasn’t broken?