The NSA’s 2011 Cryptologic History Symposium (held in Johns Hopkins) ran yesterday and today, and had plenty of names long-suffering Cipher Mysteries readers will doubtless recognize in a flash:-

* Dr. Jim Reeds, Institute for Defense Analyses: “Editing the ‘General Report on TUNNY’”
* Dr. Benedek Lang, Budapest University of Technology and Economics: “Towards a Social History of Early Modern Cryptography”
* Elonka Dunin, Independent Scholar: “Kryptos–The Decades-Old Enigma at Langley”

(Personally, I’d also love to have heard this presentation:-
* Erin Higgins, Department of Defense: “Humanism, Magic, and Cryptology in the Renaissance”)

However, arguably the big cipher mystery story of the conference was the fact that Panel session 4B, moderated by David C. Cooley from the NSA/CSS Center for Cryptologic History, was devoted to “Investigating the Voynich Manuscript” and with two Voynich speakers well-known from recent talks and results (respectively):-
* Klaus Schmeh, Independent Scholar: “New Research on the Voynich Manuscript”
* Dr. Greg Hodgins, University of Arizona: “Radiocarbon Dating and the Voynich Manuscript”

Could I perhaps tempt any attendee to email me a short description of the conference that I can put up here as a guest post? Cheers!

London, UK, 11 Nov 2010. In a surprising twist worthy of Voldemort himself, A-list children’s author and philanthropist J.K.Rowling has stepped forward to claim responsibility for the popular Internet cipher mystery meme “The Voynich Manuscript”.

She now says it all was a 1990 publicity stunt for an early release of “Harry Potter and the Philosopher’s Stone”, which was – much like Norwegian band’s a-ha’s 1985 hit single “Take On Me” – released multiple times before gaining market acceptance from young readers. Rowling’s first version (“Harry Otter and the Voynich Manuscript“) was set in “Hogshead School of Wizardry” and introduced many of the timeless elements of her story that toy conglomerates have since stripmined so mercilessly, but where all the characters were animals – for example, Ron Weasel, Hermione Echidna, and the ancient Albus Iguanodon (though note that Rubeus Hagfish played only a minor role).

In an attempt to promote her book to publishers, Rowling assembled her own ‘Voynich Manuscript’ on cafe tables in Edinburgh on old vellum she’d bought in a jumble sale, and added a threadbare cover story linking it to Holy Roman Emperor Rudolph II that ought to make any sensible historian shake his or her head in appalled disbelief: the fake manuscript then somehow ended up in the Beinecke Rare Book and Manuscript Library at Yale University (much to its curators’ embarrassment). But once some early Internet chat group participants got hold of low-quality “CopyFlo” scans of it and decided to try to ‘crack’ its cipher, the rest is cryptographic and cultural history.

“To all the codebreakers who have fruitlessly spent decades on this, I can only apologize for my viral marketing prank”, says Rowling. “Honestly, I tried to flag it was a fake on the first page but perhaps the clue was simply too obvious:-”

As a postscript, Rowling did subsequently manage to get all copies of “Harry Otter and the Voynich Manuscript” pulped: however, copies of her intermediate version (“Harry Snotter and the Handkerchief of Doom”) do still occasionally come up at auction. Jim Reeds was unavailable for comment yesterday.

I’ve had a few recent emails from historical code-breaker Tony Gaffney concerning the Voynich Manuscript, to say that he has been hard at work examining whether Voynichese might in fact be an example of an early Baconian biliteral cipher.

This is a method Francis Bacon invented of hiding messages inside other messages, by (say) choosing between two typefaces on a letter-by-letter basis – that is, steganographically hiding a binary message inside another message, one binary digit at a time. To squeeze in a 24-letter cryptographic alphabet, you’d need 5 bits (2^5 = 32), a bit like a fixed-length Morse code. Bacon proposed the following basic mapping:-

a   AAAAA   g     AABBA   n    ABBAA   t     BAABA
b   AAAAB   h     AABBB   o    ABBAB   u/v   BAABB
c   AAABA   i/j   ABAAA   p    ABBBA   w     BABAA
d   AAABB   k     ABAAB   q    ABBBB   x     BABAB
e   AABAA   l     ABABA   r    BAAAA   y     BABBA
f   AABAB   m     ABABB   s    BAAAB   z     BABBB

Immediately, it should be obvious that this is (a) boring to encipher, (b) awkward to typeset and proof, (c) boring to decipher, and (d) it requires a printed covertext five times the size of the ciphertext. So… while this would be just about OK for someone publishing prolix prose into which they would like to add some kind of hidden message for posterity, it’s not honestly very practical for “MEET ME BY THE RIVER AT MIDNIGHT”. Here’s a simple example of what it would look like in action (though using cAmElCaSe rather than Times/Arial, I’m not that sadistic):-

to Be, OR noT To be: ThaT is ThE quesTIon:
whETher ‘tiS nOBleR in the Mind tO SuffeR
The slings and arrows of outrageous fortune,
[…]

Famously, the giants of ‘enigmatology’ (David Kahn’s somewhat derisive term for hallucinative Baconian Shakespeare-ology) Ignatius Donnelly and Elizabeth Wells Gallup hunted hard for biliteral ciphers in the earliest printed editions of Shakespeare, but I’m pretty sure there’s more in the preceding paragraph than they ever found. 🙂

Historically, Bacon claimed to have invented this technique as a youth in Paris (which would have been circa 1576), so it is just about possible (if you half-close your eyes when you look at, say, the fifteenth century marginalia, and squint like mad) that he (or someone to whom he showed his biliteral cipher) might have used it to encipher the Voynich Manuscript around that time. But that leads on to two questions:

  • How might the stream of enciphered bits be hidden inside Voynichese?
  • How could we decipher it reliably?

Tony’s suggestion is that Voynichese might be hiding “dots” and “dashes” (basically, binary zeroes and ones) in the form of ‘c’-like and ‘\’-like strokes (and where gallows are nulls and/or word delimiters), something along the lines of this:-

tony-gaffney-biliteral-demoSpookily, back in 1992, Jim Reeds tried converting all the letters (apart from gallows) to c’s and i’s, to see if anything interesting emerged:-

Starting with the original D’Imperio transcription, I converted some characters to ‘c’ and some others to ‘i’, and then counted letter pairs (for pairs of adjacent non space chars, viz, in the same word).

letters mapped to c: QWXY9CSZ826
letters mapped to i: DINMEGHRJK

The results, sorted by decreasing frequency:
15481 cc
4774 Ai
4375 Oi *** O like A on right
3612 cO *** O like A on left
2591 cA
2528 OF
2482 4O
2449 Fc
1496 Pc
1427 OP
1390 ic *** rule breakers
1313 Oc
1212 FA
690 cF
495 PA
455 iO
452 cP
362 Bc
359 FO
354 PO
330 iF
275 iA *** rule breakers
168 OB
164 ci *** a few more rule breakers
124 AT
102 Vc
89 cB
88 OA
87 BO
71 Ac
68 ii
54 OV

From which one sees that O is as much c-like on the left and I-like on the right as A is.

Also notice that ic and ci does occur. In the B corpus, I-like letters seem to occur only at the ends of words. Typically a word starts out C-like and ends up I-like.

Can this I-like, C-like, and neutral stuff be a cryptological not linguistic phenomenon? Maybe the author has a basic alphabet where each letter has both a C-form and an I-form. He writes out the text in basic letters, and then writes the Voynich MS, drifting in and out of the C and I forms, just to amuse us. If this were the case, we should treat Currier <2> and Currier <R> as the same, etc, etc.

Or the author could be putting all the info in the choice of C-form versus I-form: C-form could be ‘dot’, and I-form could be ‘dash’, and choice of ‘base letter’ is noise. (Say, only the C/I value of a letter following a gallows counted, or maybe that and plume-presence of letter following a gallows.) That gives you a sequence of bits or of ‘dibits’, which is used in a Baconian biliteral or Trithemian triliteral cipher, say.

Or if you figure each word starts C-like and ends I-like, maybe the only signficant thing is what happens at the transition, which will take the form cAi or cOi. The significant thing is the pair of ci letters.

On rereading this all, it seems unlikely.

Could the VMs really be built on some kind of c-and-\ biliteral cipher? Cryptologically, I’d say that the answer is almost certainly no: the problem is simply that the ‘\’ strokes are far too structured. Though Tony’s “abandon all hope” demo shows how this might possibly work, his example is already both too nuanced (with different length cipher tokens, somewhat like Morse code but several centuries too early) and too far away from Voynichese to be practical.

While I would definitely agree that Voynichese is based in part around a verbose cipher (as opposed to what Wilkins [below] calls “secret writing by equall letters”), I really do doubt that it is as flabbily verbose as the biliteral cipher (and with lots of delimiters / nulls thrown in, too). I’d guess that a typical Herbal A page would contain roughly 30-40 characters’ worth of biliteral information – and what kind of secret would be that small?

As an historical sidenote, Glen Claston discussed the biliteral cipher on-list back in 2005:-

[…] I’ll clue you in to Bacon research – only two books are of interest, both post-fall for Lord Bacon. (I own originals of all of them, so I’m positive about this). The biliteral cipher exists in only two books, the first of which is the Latin version “De Augmentis”, London edition, 1623. This will lead you to the second, published “overseas”. (No real ground-breaking secrets there however). It raises its head only one other time, in a book entitled “Mercury, the Swift and Secret Messenger”. (Only two pages here, at the beginning, a simple exercise). [A brief use in a Rosicrucian manuscript, but Bacon was not a Rosicrucian, so this is simple plagiarism].

To be accurate, John Wilkins’ 1641 “Mercury, the Swift and Secret Messenger” does actually devote most of its Chapter IX to triliteral and biliteral ciphers (which he also calls “writing by a double alphabet”), with a reference in the margin specifically pointing to Bacon’s “De Augmentis” as its source. Personally, I suspect that Wilkins was having more fun with…

Fildy, fagodur wyndeeldrare discogure rantibrad

…though I suspect a “purer” version of the same would be…

Fildy, fagodur wyndeldra rogered ifsec ogure rantebrad

Read, decipher, enjoy! 🙂

It’s a nice historical detective story, one kicked off by John Dee, Frances Yates‘ favourite Elizabethan ‘magus’ (though I personally suspect Dee’s ‘magic’ was probably less ‘magickal’ than it might appear), when he claimed to have told an angel that his “great and long desyre hath byn to be hable to read those tables of Soyga“. Dee lost his precious copy of the “Book of Soyga” (but then managed to find it again): when subsequently Elias Ashmole owned it, he noted that its incipit (starting words) was “Aldaraia sive Soyga vocor…“.

However, since Ashmole’s day it was thought to have joined the serried, densely-stacked ranks of long-disappeared books and manuscripts, in the “blue-tinted gloom” of some mythical, subterranean library not unlike the “Cemetery of Lost Books” in Carlos Ruiz Zafon’s novel “The Shadow of the Wind” (2004)…

Fast-forward 400 years to 1994, and what do you know? Just like rush hour buses, two copies of the “Book of Soyga” turn up at once, both found by Deborah Harkness. Rather than searching for “Soyga“, she searched for its “Aldaraia…” incipit: which is, of course, what you were supposed to do (in the bad old days before the Internet).

It is a strange, transitional document, neither properly medieval (the text has few references to authority) nor properly Renaissance. There are some mysterious books referenced, such as the Liber Sipal and the Liber Munob: readers of my book “The Curse of the Voynich” may recognize these as simple back-to-front anagrams (Sipal = Lapis [stone], Munob = Bonum [Good], Retap Retson = Pater Noster [our Father]). In fact, Soyga itself is Agyos [saint] backwards.

But what was the secret hidden behind the 36 mysterious “tables of Soyga” that had vexed John Dee so? 36×36 square grids filled with oddly patterned letters, they look like some kind of unknown cryptographic structure. Might they hold a big secret, or might they (like many of Trithemius’ concealed texts) just be nonsense, a succession of quick brown foxes endlessly jumping over lazy dogs?

  • oyoyoyoyoyoyoyoyoyoyoyoyoyoyoyoyoyoy
    rkfaqtyoyoyoyoyoyoyoyoyoyoyoyoyoyoyo
    rxxqnkoyoyoyoyoyoyoyoyoyoyoyoyoyoyoy
    azzsxbqtyoyoyoyoyoyoyoyoyoyoyoyoyoyo
    sheimasddtguoyoyoyoyoyoyoyoyoyoyoyoy
    eyuaoiismspkfaqtyoyoyoyoyoyoyoyoyoyo
    enlxflfudzrxxqnkoyoyoyoyoyoyoyoyoyoy
    sxcahqczfbtfzsxbqtyoyoyoyoyoyoyoyoyo
    azepxhheurgmyknqnkoyoyoyoyoyoyoyoyoy
    rlbriyzycuyddpotxbqtyoyoyoyoyoyoyoyo
    ryrezabirhdiszeknqnkoyoyoyoyoyoyoyoy
    ogzgfceztqalpntsxhssyoyoyoyoyoyoyoyo
    opnxxsnodxqhuekknykkoyoyoyoyoyoyoyoy
    rcqsfueesfsqrqgqrossyoyoyoyoyoyoyoyo
    roauxmdkkxkhyhmpzqphdtgtguoyoyoyoyoy
    aqxmudiamubkoqifbszktdmspkfaqtyoyoyo
    sazoesrmlrnaqnzhgabmsmlpeahfsddtguoy
    ………………………………
    (etc)

Jim Reeds, one of the great historical code-breakers of modern times, stepped forward unto the breach to see what he could make of these strange tables: he transcribed them, ran a few tests, and (thank heavens) worked out the three-stage algorithm with which they were generated.

Stage 1: fill in the 36-high left-hand column (which I’ve highlighted in blue above) with a six-letter codeword (such as ‘orrase‘ for table #5, ‘Leo’) followed by its reverse anagram (‘esarro‘), and then repeat them both two more times

Stage 2: fill each of the 35 remaining elements in the top line in turn with ((W + f(W)) modulo 23), where W = the element to the West, ie the preceding element. The basic letter numbering is straightforward (a = 1, b = 2, c = 3, … u = 20, x = 21, y = 22, and z = 23), but the funny f(W) function is a bit arbitrary and strange:-

  • x f(x) x f(x) x f(x) x f(x)
    a…2, g…6, n..14, t…8
    b…2, h…5, o…8, u..15
    c…3, i..14, p..13, x..15
    d…5, k..15, q..20, y..15
    e..14, l..20, r..11, z…2
    f…2, m..22, s…8

Stage 3: fill each row in turn with ((N + f(W)) modulo 23), where N = the element to the North, ie the element above the current element.

For example, if you try Stage 2 out on ‘o’, (W + f(W)) modulo 23 = (14 + 8) modulo 23 = 22 = ‘y’, while (22 + 15) modulo 23 = 14 = ‘o’, which is why you get all the “yoyo”s in the table above.

And there (bar the inevitable miscalculations of something so darn fiddly, as well as all the inevitable scribal copying mistakes) you have it: the information in the Soyga tables is no more than the repeated left-hand column keyword, plus a rather wonky algorithm.

You can read Jim Reeds paper here: a full version (with diagrams) appeared in the pricy (but interesting) book John Dee: Interdisciplinary essays in English Renaissance Thought (2006). The End.

Except… where exactly did that funny f(x) table come from? Was that just, errrm, magicked out of the air? Jim Reeds never comments, never remarks, never speculates: effectively, he just says ‘here it is, this is how it is‘. But perhaps this f(x) sequence is in itself some kind of monoalphabetic or offseting cipher to hide the originator’s name: Jim is bound to have thought of this, so let’s look at it ourselves:-

  • 1.2.3.4..5.6.7.8..9.10.11.12.13.14.15.16.17.18.19.20.21.22.23
    2.2.3.5.14.2.6.5.14.15.20.22.14..8.13.20.11..8..8.15.15.15..2

If we discount the “2 2” at the start and the “8 8 15 15 15 2” at the end as probable padding, we can see that “14” appears three times, and “5 14” twice. Hmm: might “14” be a vowel?

  • 2 3 5 14 2 6 5 14 15 20 22 14 8 13 20 11 8
  • a b d n a e d n o t x n g m t k g
  • b c e o b f e o p u y o h n u l h
  • c d f p c g f p q x z p i o x m i
  • d e g q d h g q r y a q k p y n k
  • e f h r e i h r s z b r l q z o l
  • f g i s f k i s t a c s m r a p m
  • g h k t g l k t u b d t n s b q n
  • h i l u h m l u x c e u o t c r o
  • i k m x i n m x y d f x p u d s p
  • k l n y k o n y z e g y q x e t q
  • l m o z l p o z a f h z r y f u r
  • m n p a m q p a b g i a s z g x s
  • n o q b n r q b c h k b t a h y t
  • o p r c o s r c d i l c u b i z u
  • p q s d p t s d e k m d x c k a x
  • q r t e q u t e f l n e y d l b y
  • r s u f r x u f g m o f z e m c z
  • s t x g s y x g h n p g a f n d a
  • t u y h t z y h i o q h b g o e b
  • u x z i u a z i k p r i c h p f c
  • x y a k x b a k l q s k d i q g d
  • y z b l y c b l m r t l e k r h e
  • z a c m z d c m n s u m f l s i f

Nope, sorry: the only word-like entities here are “tondean”, “catsik”, and “zikprich”, none of which look particularly promising. This looks like a dead end… unless you happen to know better? 😉

A final note. Jim remarks that one of the manuscripts has apparently been proofread, with “f[letter]” marks (ie fa, fb, fc, etc); and surmises that the “f” stands for “falso” (meaning false), with the second letter the suggested correction. What is interesting (and may not have been noted before) is that in the Voynich Manuscript, there’s a piece of marginalia that follows this same pattern. On f2v, just above the second paragraph (which starts “kchor…”) there’s a “fa” note in a darker ink. Was this a proof-reading mark by the original author (it’s in a different ink, so this is perhaps unlikely): or possibly a comment by a later code-breaker that the word / paragraph somehow seems “falso” or inconsistent? “kchor” appears quite a few times (20 or so), so both attempted explanations seem a bit odd. Something to think about, anyway…