Even though René Z likes to tut-tut Voynich speculation (and usually with good reason, it has to be said), there’s something about the maturity and cohesion of Voynichese as a system that makes me quite sure that, unlike Athena, it did not suddenly spring forth fully-grown (and, indeed, fully-armed) from its parent’s forehead. I further infer that the author probably made a major personal investment in the Voynichese system over a long period of time – and given that it has held its secrets safe for over half a millennium, perhaps the author’s likely pride in his/her accomplishment was reasonably justified. That is, perhaps just as with Trithemius’ cryptography mere decades later, the system itself was no less a secret than its contents. 😉

Moreover, the notion that the system was accreted over time might well explain much of the fluency of the script design and the assurance of the document execution (though this much has been noted many times before). In “The Curse of the Voynich“, I made various attempts to turn the clock back to the pre-history of Voynichese, i.e. to use the letter-shapes themselves as a basis for speculating how they evolved and ended up in their final form. Of course, without Marty McFly’s Delorean (or Tom Riddle’s diary, for that matter), tempus will always fugit leaving historians clutching at long-blown-away straws: but perhaps there are some clues here that can help us peer through the fog of time…

My starting point here is that I believe the conceptual roots of the Voynichese cipher system lie not in tricksy Renaissance stateful ciphers, but in far simpler stateless ciphers and steganography, all of which were standard fare for the Quattrocento. Hence, I predict that the “ar” / “al” / “or” / “ol” 2×2 grid of verbose pairs (which I discussed in yesterday’s post) was part of an earlier (much simpler) verbose cipher that was designed to disguise the kind of repeated letters found in Roman numerals (i.e. III / XXX / CCC / MMM): and that what we see now evolved out of that earlier stateless system. It is certainly possible that the looped “l” character was originally designed to steganographically hide an “x”:-

VoynichRomanNumerals

However, this wouldn’t be much of an improvement, so you’d then need to add in hacks such as space insertion ciphers to disguise the verbose patterns: and you’d perhaps then need to add yet another system to handle small numbers (such as the a[i][i][i]r system shown above). And then you’d perhaps need to add in a second (Arabic number, aiiiv?) system… but that’s another story. All in all, this is the kind of cipher evolution I’m talking about: and what makes it speculative is that we only have the end-result of the evolution.

Now… what I’m actually wondering about at the moment is whether anyone has looked through examples of 15th century ciphertexts and cipher ledgers to see if there are any examples of people constructing verbose-pair cipherbets specifically to allow themselves to hide Roman numerals in enciphered texts. While the precise details of the execution may well be quite different, it could be that if we can find examples of the idea in action, we might be able to start tracing some kind of additional behind-the-scenes intellectual history vector for it – where it came from, what kind of person used it, who those people were connected to, etc. I have a few ideas for how to do this, which I’ll (hopefully) try out soon, see if they lead anywhere…

17 thoughts on “The evolution of Voynichese…?

  1. Rene Zandbergen on March 22, 2010 at 5:31 pm said:

    Nick, I’m quite happy to say that I could’t agree more with your first paragraph, and it is probably the most understated key point about the manuscript. There is a lot of planning behind it, and nothing ‘on the fly’.

    As regards speculation, this is fun and I wouldn’t want to spoil anyone’s fun. I like to speculate from time to time, for example that the similarities between the Voynich MS and the work / theories of Paracelsus (whose work postdates the Voynich MS) is due to the fact that Paracelsus once owned the MS and it actually inspired him 😀

  2. Rene: speculatively linking the VMs with Paracelsus is fun, but surely speculatively linking it with Nostradamus would be even more so? After all, he was in Montpellier in the 1520s, which is pretty much the right time and place for a document with French / Occitan marginalia… 🙂

  3. Marke Fincher on March 23, 2010 at 1:13 pm said:

    Speculatastic…

    In organic species evolution, features enter the mix randomly and provided they never become actively harmful to their owners, are then free to persist for long intervals and may get weirder and weirder with time….

    Likewise in the evolution of peoples Voynich theories things enter at random and in the absence of sufficient (or any) disproof they persist indefinitely and like mental ticks may get weirder and hairier with time (as may their owners?).

    What’s odd though is that disproof (or correction of direction at least) often does exist amply in the Voynichese text with a bit of digging to expose, but people may not actually want to see it.

    It’s very revealing and says a lot about the VMs…people enjoy the fun imaginings more than the desire for truth perhaps.

    Organic evolution of course always leaves a trail of some sort behind it, but evolution of thought and design can permit quantum leaps. As the VMs currently defies rigid classification you can suppose that it does have a perfectly linear trail of evolution, just one that is yet to be traced back, or you can entertain that it may have undergone a quantum leap somewhere to set it so aside to what is known.

  4. Peter on March 24, 2010 at 7:55 am said:

    I think this can be tested. Any kind of evolution like this would make the ciphertext less verbose. The entropy would decrease, or similarly, the compressibility would. You could take some pages (or sets of pages that are known to be next to each other in the original ordering) with no illustrations, and test the entropy, or the compression rate (where the success will depend on the compression algorithm, with zip compression you get something similar to entropy).

    The problem (if this method works) is that we don’t know the original ordering of the pages. So either you assume some original ordering and you conclude that the verbosity does decrease, or you assume that the verbosity decreases, and you get some idea of an original ordering. Still, might be worth a try.

  5. Peter: I suspect that entropy isn’t really a sufficiently helpful measure in this respect, because many other aspects of the cipher changed at the same time.

    For example, I think there are signs of language evolution in tokens like “l-” (i.e. where “l” is not preceded by “a” or “o”), “eo”, “od”, etc. Also, very many major aspects of the stats change between A and B, most notably non-repeated “e” nearly doubles in frequency (from 3.84% to 7.08%), as do “ee” (from 1.66% to 3.44%) and k (from 2.93% to 5.17%), while “dy” zooms up from 1.85% to 6.05% and freestanding “l” goes from 0.3% to 1.91%! Mark Perakh interprets these as indicating that A pages are more heavily abbreviated than B pages, but actually I think that many of these stats are in fact signs of change/evolution in the underlying system. Nobody has really nailed the key systemic differences between A and B yet: this is arguably a great void waiting anxiously to be filled. 🙂

    Currier A pages, Takahashi transcription, pair-parsed analysis from my JavaScript Voynich analysis page:
    (. 20.91%) (ch 9%) (y 7.31%) (ol 5.04%) (d 4.27%) (e 3.84%) (o 3.55%) (or 3.27%) (sh 3.19%) (- 3.1%) (aiin 3%) (k 2.93%) (qo 2.57%) (s 2.33%) (ok 1.93%) (dy 1.85%) (t 1.84%) (ee 1.66%) (al 1.59%) (ar 1.53%) (ot 1.5%) (= 1.23%) (cth 1.18%) (i 1.13%) (a 1.09%) (od 1.03%) (m 0.9%) (eo 0.82%) (ckh 0.62%) (yk 0.59%) (ain 0.58%) (n 0.55%) (yt 0.51%) (p 0.44%) (r 0.32%) (l 0.3%) (air 0.28%) (cph 0.25%) (op 0.23%) (* 0.21%) (eee 0.19%) (ockh 0.16%) (f 0.16%) (octh 0.15%) (c 0.14%) (h 0.12%) (g 0.1%) (of 0.08%) (cfh 0.08%) (yp 0.07%) (q 0.07%) (aiir 0.06%) (aiiin 0.04%) (ocph 0.03%) (yf 0.02%) (eeee 0.02%) (ocfh 0.01%) ( 0%) (z 0%) (aiiir 0%)

    Currier B pages, Takahashi transcription, pair-parsed analysis from my JavaScript Voynich analysis page:
    (. 21.8%) (e 7.08%) (ch 6.66%) (dy 6.05%) (y 5.62%) (k 5.17%) (qo 4.4%) (ee 3.44%) (ol 3.08%) (sh 3.06%) (d 2.97%) (aiin 2.5%) (ar 2.34%) (- 2.13%) (al 2.04%) (l 1.91%) (t 1.84%) (ok 1.8%) (ot 1.67%) (ain 1.52%) (s 1.41%) (o 1.34%) (or 1.28%) (a 0.92%) (p 0.75%) (= 0.72%) (m 0.65%) (i 0.59%) (r 0.55%) (eo 0.52%) (ckh 0.51%) (od 0.47%) (air 0.41%) (yk 0.36%) (yt 0.3%) (cth 0.29%) (eee 0.28%) (op 0.28%) (f 0.21%) (n 0.19%) (h 0.12%) (q 0.11%) (aiiin 0.1%) (cph 0.08%) (aiir 0.07%) (of 0.06%) (c 0.06%) (ockh 0.05%) (octh 0.05%) (yp 0.05%) (* 0.04%) (g 0.03%) (x 0.03%) (cfh 0.02%) (yf 0.01%) (ocph 0.01%) (ocfh 0.01%) (eeee 0%)

    Marke: why don’t you try to nail the major changes between A and B yourself? Then we can start worrying about whether or not that overall evolutionary is smooth or punctuated, and in what direction it flows. 🙂

  6. Marke Fincher on March 24, 2010 at 11:27 am said:

    Dunno if that was a serious comment or not, but I’ve gone way beyond that! (And you need to consider more than just ‘A’ and ‘B’; which is like trying to paint a rainbow in black and white)

  7. Marke: I know you’ve been looking at A, B, and all the shades inbetween – but I don’t recall ever hearing that you’d posted what you’d found anywhere. Is it time, or am I late to the party? 🙂

    Also, I have a particularly strong bias in how useful different stats are, in particular because I am sure that “qo” is a free-standing token that attaches itself to the front of other words. Evidence: the in-page correlation between “qol” and freestanding “l-” words, but nothing similar for “qor”. Hence, if you don’t pre-parse out “qo”, you’ll get quite different stats for “ol”, “ot”, “of”, “ok”, “op”, “ockh”, “ocph”, “octh”, “ocfh”, etc. Do you think that analyses that don’t pre-tokenise “qo” will yield misleading results?

    PS: A pages: qol = 7, qor = 8, qool = 3, qoor = 6, while B pages: qol = 239, qor = 23, qool = 6, qoor = 2.

  8. Rene Zandbergen on March 24, 2010 at 12:03 pm said:

    I agree with Marke there. There are dialects between A and B, and there are also different flavours of B. There is not a lot of text in the MS written in the dialects between A and B though, and this is mainly on the foldout pages. Currier did not transcribe these and thus he never saw these.

    W.r.t. prefix qo-, I don’t think that this is a very special case. I see it as one of several examples of ‘paired’ characters. Other pairs are ch/sh, k/t, l/r. The idea is that substituting one by the other gives an equally valid word. Prefix qo- is paired with o-. The fact that these pairs do not end up with the same probability is a good sign for me. It shows that the MS text is not just random noise.

    I would hesitate to make any statement about what this pairing means, though 🙂

  9. Rene: of course, I agree wholeheartedly with your observations that there are dialects inbetween A and B, and that there are different types of B (Q13ese, for example). The issue is whether we can transform the myriad of statistical observations into a coherent (preferably visual) account that can start to move us towards an idea of how they morphed into each other (and perhaps in what order).

    But as for “qo gallows…” words, I think these are actually paired with “gallows…” words, and work quite differently from “o gallows…” words. As per “The Curse” p.214, the overall ratios of (k:t) and (qok:qot) are substantially higher than the ratios for both (ok:ot) and (yk:yt). I don’t know what kind of evidence would convince you of this, though: which is, I guess, why I asked Marke. 🙂 The point remains that not pre-parsing qo- will sharply affect many other stats, particularly if you try to build state machines (which is probably our next collectively big useful step forward).

    Building on this observation, I’m now pretty convinced that “qo” is a free-standing common word pushed up to the front of the following word to disguise its presence, and is almost certainly “lo” / “la” / “le” (depending on which language you think is being enciphered). Naturally, given that the word “lo” is hidden inside “4o”, perhaps Italian is most probable? 😉

  10. Rene Zandbergen on March 24, 2010 at 12:48 pm said:

    Nick: that the ratios of (k:t), (qok:qot), (ok:ot) are not the same, and also vary for the different languages, I äm already convinced of. This is in fact what I meant with:
    “The fact that these pairs do not end up with the same probability is a good sign for me. It shows that the MS text is not just random noise.

  11. Rene: it seems we’re not actually that far apart on these aspects (apart from my inference that qo is a free-standing word). So where do we take all of this next? It seems we have a lot of analysis, and not a lot of synthesis. 🙁

  12. Rene Zandbergen on March 24, 2010 at 2:51 pm said:

    Where do we take this next, or ‘qo vadis’ as it were… (I know this is just about as awful as the dis-dain errrm “pun”).

    It is somewhere down my list to find that one clever explanation for the word structure. The difficulty is not in the lack of ideas. One real problem is the question how many ‘exceptions’ (as there will necessarily be) can be considered acceptable.

  13. Rene: I suspect that looking for “that one clever explanation for the word structure” will remain just a bit premature until you can clearly grasp how that same word structure changes / evolves across the constructional life of the document. The conventional (wrong) view is to look at the VMs as if it were a single, static homogenous entity: time to upset the status qo! 😉

  14. Rene Zandbergen on March 25, 2010 at 7:32 am said:

    Well, I called it a “clever” explanation because it would have to include an explanation for the variations / evolution of qourse 🙂

  15. Rene: there’s a fine line between qlever and qoqy, I hope your explanation stays on the right side of it! 🙂

  16. Rene Zandbergen on March 25, 2010 at 1:19 pm said:

    If I ever find it 🙁

  17. Rene: you probably already have found it, but rejected it for some utterly sensible reason. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Post navigation