11Nov 2009

Voynichese = Biliteral Cipher?

I’ve had a few recent emails from historical code-breaker Tony Gaffney concerning the Voynich Manuscript, to say that he has been hard at work examining whether Voynichese might in fact be an example of an early Baconian biliteral cipher.

This is a method Francis Bacon invented of hiding messages inside other messages, by (say) choosing between two typefaces on a letter-by-letter basis – that is, steganographically hiding a binary message inside another message, one binary digit at a time. To squeeze in a 24-letter cryptographic alphabet, you’d need 5 bits (2^5 = 32), a bit like a fixed-length Morse code. Bacon proposed the following basic mapping:-

a   AAAAA   g     AABBA   n    ABBAA   t     BAABA
b   AAAAB   h     AABBB   o    ABBAB   u/v   BAABB
c   AAABA   i/j   ABAAA   p    ABBBA   w     BABAA
d   AAABB   k     ABAAB   q    ABBBB   x     BABAB
e   AABAA   l     ABABA   r    BAAAA   y     BABBA
f   AABAB   m     ABABB   s    BAAAB   z     BABBB

Immediately, it should be obvious that this is (a) boring to encipher, (b) awkward to typeset and proof, (c) boring to decipher, and (d) it requires a printed covertext five times the size of the ciphertext. So… while this would be just about OK for someone publishing prolix prose into which they would like to add some kind of hidden message for posterity, it’s not honestly very practical for “MEET ME BY THE RIVER AT MIDNIGHT”. Here’s a simple example of what it would look like in action (though using cAmElCaSe rather than Times/Arial, I’m not that sadistic):-

to Be, OR noT To be: ThaT is ThE quesTIon:
whETher ‘tiS nOBleR in the Mind tO SuffeR
The slings and arrows of outrageous fortune,
[…]

Famously, the giants of ‘enigmatology’ (David Kahn’s somewhat derisive term for hallucinative Baconian Shakespeare-ology) Ignatius Donnelly and Elizabeth Wells Gallup hunted hard for biliteral ciphers in the earliest printed editions of Shakespeare, but I’m pretty sure there’s more in the preceding paragraph than they ever found. 🙂

Historically, Bacon claimed to have invented this technique as a youth in Paris (which would have been circa 1576), so it is just about possible (if you half-close your eyes when you look at, say, the fifteenth century marginalia, and squint like mad) that he (or someone to whom he showed his biliteral cipher) might have used it to encipher the Voynich Manuscript around that time. But that leads on to two questions:

How might the stream of enciphered bits be hidden inside Voynichese?
How could we decipher it reliably?

Tony’s suggestion is that Voynichese might be hiding “dots” and “dashes” (basically, binary zeroes and ones) in the form of ‘c’-like and ‘\’-like strokes (and where gallows are nulls and/or word delimiters), something along the lines of this:-

Spookily, back in 1992, Jim Reeds tried converting all the letters (apart from gallows) to c’s and i’s, to see if anything interesting emerged:-

Starting with the original D’Imperio transcription, I converted some characters to ‘c’ and some others to ‘i’, and then counted letter pairs (for pairs of adjacent non space chars, viz, in the same word).

letters mapped to c: QWXY9CSZ826
letters mapped to i: DINMEGHRJK

The results, sorted by decreasing frequency:
15481 cc
4774 Ai
4375 Oi *** O like A on right
3612 cO *** O like A on left
2591 cA
2528 OF
2482 4O
2449 Fc
1496 Pc
1427 OP
1390 ic *** rule breakers
1313 Oc
1212 FA
690 cF
495 PA
455 iO
452 cP
362 Bc
359 FO
354 PO
330 iF
275 iA *** rule breakers
168 OB
164 ci *** a few more rule breakers
124 AT
102 Vc
89 cB
88 OA
87 BO
71 Ac
68 ii
54 OV
…

From which one sees that O is as much c-like on the left and I-like on the right as A is.

Also notice that ic and ci does occur. In the B corpus, I-like letters seem to occur only at the ends of words. Typically a word starts out C-like and ends up I-like.

Can this I-like, C-like, and neutral stuff be a cryptological not linguistic phenomenon? Maybe the author has a basic alphabet where each letter has both a C-form and an I-form. He writes out the text in basic letters, and then writes the Voynich MS, drifting in and out of the C and I forms, just to amuse us. If this were the case, we should treat Currier <2> and Currier <R> as the same, etc, etc.

Or the author could be putting all the info in the choice of C-form versus I-form: C-form could be ‘dot’, and I-form could be ‘dash’, and choice of ‘base letter’ is noise. (Say, only the C/I value of a letter following a gallows counted, or maybe that and plume-presence of letter following a gallows.) That gives you a sequence of bits or of ‘dibits’, which is used in a Baconian biliteral or Trithemian triliteral cipher, say.

Or if you figure each word starts C-like and ends I-like, maybe the only signficant thing is what happens at the transition, which will take the form cAi or cOi. The significant thing is the pair of ci letters.

On rereading this all, it seems unlikely.

Could the VMs really be built on some kind of c-and-\ biliteral cipher? Cryptologically, I’d say that the answer is almost certainly no: the problem is simply that the ‘\’ strokes are far too structured. Though Tony’s “abandon all hope” demo shows how this might possibly work, his example is already both too nuanced (with different length cipher tokens, somewhat like Morse code but several centuries too early) and too far away from Voynichese to be practical.

While I would definitely agree that Voynichese is based in part around a verbose cipher (as opposed to what Wilkins [below] calls “secret writing by equall letters”), I really do doubt that it is as flabbily verbose as the biliteral cipher (and with lots of delimiters / nulls thrown in, too). I’d guess that a typical Herbal A page would contain roughly 30-40 characters’ worth of biliteral information – and what kind of secret would be that small?

As an historical sidenote, Glen Claston discussed the biliteral cipher on-list back in 2005:-

[…] I’ll clue you in to Bacon research – only two books are of interest, both post-fall for Lord Bacon. (I own originals of all of them, so I’m positive about this). The biliteral cipher exists in only two books, the first of which is the Latin version “De Augmentis”, London edition, 1623. This will lead you to the second, published “overseas”. (No real ground-breaking secrets there however). It raises its head only one other time, in a book entitled “Mercury, the Swift and Secret Messenger”. (Only two pages here, at the beginning, a simple exercise). [A brief use in a Rosicrucian manuscript, but Bacon was not a Rosicrucian, so this is simple plagiarism].

To be accurate, John Wilkins’ 1641 “Mercury, the Swift and Secret Messenger” does actually devote most of its Chapter IX to triliteral and biliteral ciphers (which he also calls “writing by a double alphabet”), with a reference in the margin specifically pointing to Bacon’s “De Augmentis” as its source. Personally, I suspect that Wilkins was having more fun with…

Fildy, fagodur wyndeeldrare discogure rantibrad

…though I suspect a “purer” version of the same would be…

Fildy, fagodur wyndeldra rogered ifsec ogure rantebrad

Read, decipher, enjoy! 🙂

Posted in: Voynich Manuscript ⋅ Tagged: Francis Bacon, Glen Claston, Jim Reeds, John Wilkins, Tony Gaffney

11 thoughts on “Voynichese = Biliteral Cipher?”

Rich SantaColoma on November 11, 2009 at 2:28 pm said:

Considering that my theories suppose some connection, or at least influence, of Bacon, I had spent some time with the Biliteral Cipher and the Voynich. In fact I think it was “way back when” you were still a member of the VMs mailing list… but it may have been just after you left.

The first thing to remember is that this system can use an almost infinite choices for the encipherer, when it comes to the choice between what comprises an “a” and a “b”. For your example, Tony chose “c” and “\” -like strokes… I tried several things, the most promising being the distinction between high and low characters. I felt that there was such a differentiation between the heights of characters that this would be a logical choice for ease of enciphering and deciphering. In fact I could pull the (speculative) a’s and b’s very easily.

I tried the highs as a’s and the lows as b’s, and then tried the opposite. I also used several different character breaks… strict breaks at first, and then using any string of characters as one… the usual problem with any attempts: i.e., what comprises a character. For instance, I would use a string of VMs “c’s” both as one low character, and then alternately (in separate attempts, of course) as individual low characters… two or three as would be the case.

Interestingly I found some compelling strings of Latin syllables, in one of the half dozen versions of factors I tried… and none in the others. Also, I found that one alternation of high/low choices had all the a/b strings fall within the alphabet, and the other has impossible strings. I found this interesting and encouraging enough to go back and try various blocks of text over time.

As you point out, a danger of these attempts is a great susceptibility to subjective error. I would recommend the Friedman’s book on this subject, “The Shakespearean Ciphers Examined”, as a great indication of what can go wrong when trying such an attempt on any text:
http://www.questia.com/PM.qst?a=o&d=11853542
This book puts to rest any possibility that the Biliteral has anything to do with the writings of Shakespeare… and I won’t even add IMO in this case. The Friedman’s make an air tight argument.

But nonetheless, I feel it needs more time and effort, and I, too, have seen others mention it, and wonder at it. The idea itself fits well within my time frame and circle of influence… and I find it interesting that you did not dismiss it on those grounds, considering that you do not agree that time and influence possible… But I would not yet dismiss it on the other grounds you give: That is, on the result from the characters in the example you cite, or the fact that it would only yield a 1 to 5 plain text. First, because so many alternate character differences may have been chosen, and second, because we cannot assume any level of content to the Voynich. There is no reason to pre-suppose it contains any more than one fifth the characters which we see…

All that being said, the Biliteral is one of my three favorite candidates, and holds promise in my opinion, for many reasons even beyond what I’ve touched on in this (overly long) response…
nickpelling on November 11, 2009 at 4:14 pm said:

Hi Rich,

Thanks for the detailed comment, much appreciated!. 🙂

From my own cryptological perspective, one of the things I’d expect to see emerging from any genuine historical use of a biliteral cipher is a fairly even spread of a’s and b’s. Yet every decomposition I’ve seen of Voynichese yields a very uneven (“clumpy”) and heavily-structured spread of symbols / tokens / strokes / bits / etc. Basically… at whatever level you look, you find a great deal of internal structure.

And so the question to consider is whether that structure arises as a result of the particular biliteral cipher alphabet that was chosen. Well… while a modern code-maker could construct a cipher to match just about any set of constraints, I’m not so convinced that anyone prior to the 20th century would have been able to do. While it’s true that people knew that adding additional enciphering stages makes ciphers much harder to crack (John Wilkins explicitly says as much at the end of his chapter IX, as I recall), I’m far from convinced that someone prior to the (say) 20th century would have been had a sufficiently masterly grasp of statistics to consciously design a biliteral alphabet that is simultaneously biliteral and clumpy.

One might call this a “Rugg-ish historical fallacy”: that just because someone with a PC could design such a thing today, then someone could (theoretically, at least) have designed the same thing 400+ years ago. Yes, it’s a possibility… but that doesn’t begin to explain how they managed to get to that end point (if they were even aware of what that end point might look like).

Which is not to say that (say) Francis Bacon wouldn’t have been clever enough to follow just about any intellectual procedure through to its logical conclusion: but, rather, that I strongly doubt that he or anyone else had a sufficiently abstracted view of statistics to construct something specifically to simulate the statistical model of something else.

Bacon’s biliteral cipher was just a way of hiding a message surreptitiously – it was a steganographic notion, not a overtly cryptographic one. When I look at “aiiv” glyph-blocks, I see something else masquerading as a medieval page reference: so there’s definitely steganography there. And when I look at the low entropy / highly-structured glyph pairs, I also see verbose cipher tricks at play. Hence I’d agree that the biliteral cipher, with its steganography and verbosity, does press the right cryptographic buttons… unfortunately not quite in the right way. 😮

Cheers, ….Nick Pelling….
Rich SantaColoma on November 11, 2009 at 9:31 pm said:

Hi Nick: For your point, “I’ve seen of Voynichese yields a very uneven (”clumpy”) and heavily-structured spread of symbols / tokens / strokes / bits / etc. Basically… at whatever level you look, you find a great deal of internal structure.”

It’s true there is such a spread of symbols and so on, when Voynichese is viewed as distinct symbols. But my point was, and why the Biliteral could apply, that it is only two characteristics of these symbols which are needed, or would be used. Ten different symbols could be a’s, ten could be b’s, and you now have twenty symbols. But it does not matter, only one characteristic of each… you really, still, only have two symbols, easily read. In my example, height: High and low. Now it does not matter what the character is or looks like, or the strokes it contains… you only need to know high or low. It could be something else. All characters may contain some distinctive characteristic, which differentiates them into a’s and b’s, which is simple and readily apparent to the codes’ users. The complexity of their structure could be moot, and also then, a red herring. While we are looking at all the distinctions of gallows, for instance, it would not, in this case, matter one whit: They would all simply be: highs.

I think that such a system… like this, or some form of it… would in fact account for many of the problems of Voynichese… the seeming complexity, which the Biliteral would not need; the seeming bizarre word and character counts, which the Biliteral would not affect, or use… and so on. The Biliteral is a sort of “overlay” of what can look like a plain text language (Bacon suggested, as you know, two distinct typefaces), or look like a complex cipher, or even an illustration… so that what is seen seems like one thing (and can seem very complex), but then the true code is obvious to those who know it, is actually very, very simple to put in and pull out. I could, in fact, quickly write out a fake Voynichese line using Biliteral, which would be easily read by you in minutes. When I make my page I will give an illustrated example.

And you know, in my opinion, the code/cipher used is probably something extremely simple, just so far, unseen by us. There are several systems which would allow for this… biliteral is one.
nickpelling on November 11, 2009 at 10:40 pm said:

Hi Rich,

I think this all amounts to a modern projection onto an historical situation. Unless you have any evidence to the contrary, the clumpiness is very probably there for a reason (i.e. as a direct result of the encipherment system) – not just to entertain us.

Cheers, ….Nick Pelling….
Rich SantaColoma on November 11, 2009 at 11:15 pm said:

Hi Nick!

No… no evidence of course. I agree that the usual opinion on this may be correct… that the clumpiness, the seeming complexity, the odd and ordinary counts and so on and so forth, are due to some intricate… maybe probably so, as you say… cipher or code.

And I know you are being a bit “tongue in cheek” when you say it wouldn’t be there “just to entertain us”. But I would say it could be “there to confuse us”… useless complexity would be a fantastic, brilliantly diabolical cover for a simple cipher, and really very easy to do (not to see, however).

Rich.
Pingback: Biliteral: A Cipher in Plain Site? « The Voynich-New Atlantis Theory
Chris.W on January 27, 2010 at 5:15 pm said:

look for the pictures hidden inside
the writing forms pictures, Ive found a row of baby elephants, ive found a swan wich is above a elephant and bear and these two are on the back of a lion, etc etc I know im good but not that good im still trying to find the telephone numbers for all those naked girls.
nickpelling on January 27, 2010 at 5:30 pm said:

Don’t worry, if the naked girls work out that you can read their secret diary, they’re bound to call you. 😉
John Willemse on November 16, 2011 at 4:01 pm said:

To address you problem of the 1 to 5 ratio of plaintext to ciphertext: the encipherer might have realised this and solved it by assigning certain groups an abbreviation. For example, AAA is present in 7 of the 24 mapping, AA in 16 of the 24, &c. He could have assigned a single Voynichese characters for such common occurrences.

On another note, he could also have assigned a single character to larger groups of As and Bs. You would then encode a plaintext to the mapping first in its entirety and then fractionate them by converting groups to characters.

Example:

TEST = BAABA AABAA BAAAB BAABA

Split this in groups of 8 (128 possibilities/distinct characters):

BAABAAAB AABAAABB AABA…

…and assign each group of 8 to a single character.

As you can see, this actually compresses the plaintext information instead of inflating it.

It’s all theoretically possible, but how do you confirm such a method?
nickpelling on November 16, 2011 at 5:09 pm said:

John: it’s a bit of a mess, at least a century too late, and doesn’t really help explain all the secondary statistical features that Voynichese so prominently presents. I’m with Brigadier John Tiltman on this: that Voynichese is basically a mash-up of several simple substitution cipher tricks, arranged cleverly… but not the biliteral cipher. Having said that, I’d agree (as Mark Perakh argued several years ago, at some length) that there’s good evidence of abbreviation, though!
John Willemse on November 16, 2011 at 5:50 pm said:

Nick, I was actually wondering after I posted that message: how about the statistics? Whether useful or not, tomorrow I’m going to encrypt several large texts (some Gutenberg project books in several languages) this way and “compress” them as described above, maybe varying the parameters over a few runs. I’m just curious to see what will happen to the language statistics, specifically the resulting n-graphs and their distribution. Maybe it can’t be applied directly to the VMS, but it might nevertheless be interesting to see if there are similarities in letter touching statistics.