Though (as was apparent from the rapid social media take-up of yesterday’s XKCD webcomic) the Voynich Manuscript is now firmly wedged in the cultural mind, sadly the level of debate on it is still stuck circa 1977 – and if anything, Gordon Rugg’s foolish “hoax” claims have helped to keep it there.

But it is demonstrably written in cipher: and so this post tells you why I’m certain it’s a cipher, how that cipher works, and what you can do to try to break it. I’m happy to debate this with people who disagree: but you’ll have to bear in mind that as far as this goes, I’m just plain right and you’re just plain wrong. 🙂 

1. What does the Voynich Manuscript resemble?

Firstly, the overwhelming majority of the Voynich Manuscript is written using only 22 or so letter-shapes: generally speaking, this is the size of a basic European alphabet. Voynichese therefore visually resembles an ordinary European language.

Secondly, even though most of its letter shapes are unknown or unusual, four of them (“a”, “o”, “i”, and “e”, though this last one is styled as “c”) closely resemble vowels in European languages – not only in shape, but also because if you read these as vowels (precisely as the main EVA transcription does), you end up with many CVCVCV (consonant-vowel) patterned words that seem vaguely pronounceable.

Thirdly, dotted through the Voynich Manuscript is a family of letter-groups that look like “aiv”, “aiiiv”, “aiir”, etc. To most contemporary eyes, this looks like some kind of curious language-pattern: but to European people in the 13th to 16th centuries, this denoted one thing only: page references.

  • The “a” denotes the first quire (bound set of folded vellum or paper leaves), “quire a”.
  • The “i” / “ii” / “iii” / “iiii” denotes the folio (leaf) number within the quire (in Roman numbers).
  • The “r” / “v” denotes “recto” / “verso”, the front-side or rear-side of the leaf.

Circa 1250-1550, this “mini-language” of page references was universally known and recognized across Europe: and hence “aiiv” denotes “quire a, folio ii, verso side” and nothing else.

Therefore, the Voynich Manuscript resembles a document written in a 22-letter European language, contains obvious-looking vowel-shapes that are shared with existing European languages, and scattered throughout apparently has copious page-references to pages within its first quire.

However, what even very clever people continue to fail to notice is that these three precise things (the compact alphabet, the obvious-looking vowels, and the page references) have an exact corollary: that this does not resemble ciphertext – for even by 1440, most European cipher-makers knew enough about the vulnerabilities of vowels to disguise them by use of homophones (i.e. using multiple cipher symbols for the vowels). A ciphertext would not contain unenciphered vowels, not unenciphered page references.

The correct answer to the question is therefore not only that the Voynich Manuscript does resemble an unknown (but CVCVCV-based) European language studded with conventional Roman number page references, but also that it simultaneously does not resemble a ciphertext.

2. Why is the Voynich Manuscript not what it resembles?

I think the big clue is the fact that the page references don’t make any sense as page references.

For a start, even though the Voynich Manuscript probably consisted of fifteen or more quires, the page references that appear throughout its text only ever appear to refer to quire “a” (the first quire). What’s more, the first quire appears not to be marked with any form of “a” marking, which is curious because the whole point of quire signatures was to make sure that the binder bound them together in the correct order. Another odd thing is that there only appears to be references to the first six pages of the first quire.

All very strange: but the biggest giveaway comes from the statistics. Counting the number of instances of the different page references, you’ll see that page references to verso pages apparently outnumber page references to recto pages by eight times. Here are the raw counts (from the Takahashi transcription):-

air ( 564)   aiir ( 112)  aiiir (  1)
aiv (1675)   aiiv (3742)  aiiiv (106)

So, even though these superficially resemble page references, there is absolutely no evidence to suggest that this is what they actually are. In fact, the statistics imply the opposite – that despite their visual resemblance to page references, these are not actually page references.

And if it is correct that these are actually something else masquerading as page references, the entire visual-resemblance house of cards collapses – that is, if things are not what they seem, the other visual presumption (that this is a simple CVCVCV-based European language) necessarily falls down with it.

3. If the page references aren’t page references, what are they?

This is precisely the right question to ask: and so, when I visited the Beinecke Library in early 2006, I decided to spend some time looking at a single page containing plenty of clearly-written page references (as described in The Curse of the Voynich, pp.164-168) to try to answer it.

I chose page f38v, from which here are all the page reference letter clusters – can you now see what it took me hours and hours to notice?

f38v-page-reference-groups

The first thing I (eventually) noticed was that there was something a little odd about the top part of the “v” letter (which EVA wrongly transcribes as “n”, incidentally). Specifically, that many of the clusters appear to have been written using two inks – one forthe main “aiiv” part, and another (often slightly darker) one for the scribal “flourish” at the top.

But then… once you start looking specifically at the “v flourishes”, the next thing you might notice is that some appear to have a dot at the (top-left) end of the v-flourish.

The final thing you might notice is that these dots tend to appear in different places relative to the “aiiv” frame.

My conclusion is that what is happening here is steganography – that the position of the dot at the end of the v-flourish is what (possibly together with the choice of cluster) is enciphering the information here.

But what information is being enciphered in this way? I strgonly suspect that it is enciphering Arabic numbers 1-5 (probably with longer flourishes denoting larger numbers), and with “oiiv” clusters perhaps denoting 6-10. This might explain why we see so many of these “page references” immediately following each other (the famous “daiin daiin” pattern): each “page reference” therefore represents a digit within a multi-digit Arabic number.

However, what is strange is that this is only basically true for “Currier A-language” pages (Prescott Currier noted that, to a large degree, the text in Voynich Manuscript pages behaves in one of only two different ways): for Currier B pages, what seems to happen is that the information is enciphered by using different shaped flourishes for the final “v” character, and no dot.

From all this, I think I can reconstruct how the Voynich Manuscript’s cipher system evolved during its writing. In the early (Currier A) phase, some kind of data (probably Arabic numbers) were steganographically hidden by writing page-reference-like “aiiv” groups and placing a single dot above them. At a later date, however, the author decided (rightly, I think) that this was too obvious, and so went through the text hiding the dots by converting them into flourishes. Whereas in the later (Currier B) phase, the author decided to evolve the writing system to encipher the same data in a subtly different way (though still relying on the basic “page-reference” shape as a starting point).

And so the correct answer to the section’s question is: even though the “page reference” groups resemble page references, I think that they are cryptographic nulls designed to give the author sufficient visual space on the page to steganographically hide something completely different – probably Arabic numbers.

Of course, existing EVA transcriptions capture only the covertext (the nulls), while the actual data is enciphered in the dots hidden by the flourishes. But you have start somewhere, right? 🙂

4, What, then, is Voynichese’s CVCVCV structure concealing?

I am certain that the Voynich Manuscript’s apparent “consonant-vowel”-like structure is another visual trap into which the existing EVA transcription (unfortunately) helps to push people. By making Voynichese seem vaguely pronounceable (“otolal”, “qochey”, “qokeedy”, etc), EVA discourages us from looking at what is actually going on with the letters, while also falsely bolstering the confidence of those sufficiently deceived into believing (wrongly) that Voynichese is written in a real language. Basically, anyone who tells you it’s written in an archaic language has fallen into a gigantic intellectual trap first set five centuries ago.

But what of the CVCVCV structure? Where does that come from?

For the most part, I think that it arises from a late cipher stage known as “verbose cipher” (i.e. enciphering a single plaintext letter as two ciphertext letters). Though not all letters behave in this way, it certainly goes a very long way to explain the behaviour of common groups such as: qo, ol, al, or, ar, ot, ok, of, op, yt, yk, yp, yf, cth, ckh, cfh, cph, ch, sh, air, aiir, od, eo, ee, and eee. If you decompose the text into these subgroups (i.e. that these groups encipher individual tokens in the plaintext) while remembering to parse the “qo” group first, all the superficial CVCVCV behaviour disappears – and (I contend) you will find yourself very much closer to a kind of raw ciphertext stream that is more easily broken.

As supporting evidence, I point to those few places where the author has “twiddled” with the final code-stream to try to disguise obvious repeated patterns, arising from repeated letters in the plaintext (code-makers hate repeated patterns in their ciphertext). Perhaps the most notable of these is on f15v, where the “or” pattern appears three times in succession on line 1, and four times in a row on lines 2:-

f15v-space-transposition

I think that the author has added spaces in here to try to disguise the repeated “or” group: in line 1, he has inserted a space to turn “ororor” into “oror or“, while in line 2 he has inserted three spaces to turn “orororor” into “or or oro r“. I’m not fooled by this – are you?

I predict here that that “or” is enciphering “c” or “x” (probably “c”), and that the plaintext reads “ccc … cccc”: but you guessed that already, right?

5. Even if this is right, how does it help us break the Voynich?

I don’t believe for a moment that this explains the whole of the Voynichese cipher system: there are plenty of subtly surprising features that any proposed solution would also need to explain, such as:-

  • Precisely how (and why) Currier A and Currier B differ (for example, the whole word-initial “l” thing)
  • Why “yk / yt / yf / yp” occur more in labels than in normal paragraphs
  • Why so few non-trivial words appear more than once across the whole manuscript text
  • What “4o” codes for (I suspect a common initial-letter expansion, i.e. [qo] + ‘c’ –> ‘con’)
  • What word-initial “8” codes for (I suspect “&”)
  • What non-word-initial “8” and “9” code for (I suspect ‘contraction’ and ‘truncation’)
  • Whether the ciphering system is stateless or stateful (but that’s another story)
  • What “Neal keys” denote (but that’s another story, too)
  • etc

However, what I do believe is that all the above lays down the basic groundwork from which any sensible cipher attack would need to launch forwards. I do not share the widely-held pessimistic view that the Voynich is somehow intrinsically unbreakable – on the contrary, it is an all-too-human artefact from a specific time (between 1450 and 1500) and place (probably Northern Italy, though Germany is possible too), and the craft techniques it deftly uses to conceal its content from us are both far from invisible and far from infallible.

If you take the basic steps I describe above to look beyond the deliberate deception and the mythology, then I am certain you will find yourself on the right path towards seeing clearly both what the Voynich Manuscript actually is and how its cipher system works. Let me know when you’ve broken it! 🙂

Incidentally, there’s plenty more related stuff in my 2006 book (which is where the two diagrams above came from, p.165 and p.160 respectively)… but you knew that already, I’m sure. 🙂

44 thoughts on “The Voynich Cipher for code-breakers…

  1. Fascinating, and provocative, as usual. Your blog is always a source of inspiration, and written so very entertainingly.

    I currently like the theory that the VMs “words” are in fact part words (ngrams) of the plaintext. This theory also explains several of the unusual features of the text.

  2. Vytautas on June 8, 2009 at 5:24 am said:

    Hi, Nick,
    I have one question 🙂 May it be that marks “i” in words “aiiin”, “aiin” or so denotes changes of state ? It will be interesting to know your opinion for me…

  3. Hi Vytautas,

    It’s certainly possible – and remember that Steve Ekwall believed that ‘ch’ denoted a ‘flip’ and the ‘e / ee / eee” characters denoted a ‘step’. 🙂

    All the same, I think we would need to amass a whole new class of raw statistical evidence before we begin to look for stateful behaviour…

    Cheers, ….Nick Pelling….

  4. Hi Julian,

    I suppose the first issue with all word-based theories is working out to what degree you believe a Voynichese “word” corresponds to a plaintext word. There are certainly many places where you can see (what I think Glen Claston termed) “half-spaces”, which you’d have to be particularly careful about when looking for any kind of word-based stats: inserted spaces would also disrupt any merry stats-gathering parties. 🙁

    And the second issue is that if Voynichese words do (for the most part) correspond to plaintext words, and you also happen to subscribe to the notion that common pairs (such as qo, ol, or, al, ar, etc) encipher single tokens, then you’re left with the problem that the words are far too short to be real words – i.e. that some kind of shorthand “shortening” stage precedes the verbose stage. My own prediction, then, is that “4o” codes for “subscriptio” (i.e. expansion of the first letter), mid-word “8” codes for “superscriptio” (i.e. syllable contraction), and word-final “9” codes for “truncatio” (i.e. shortening of the overall string).

    To be honest, I’d be a bit surprised if you could find many examples (i.e. above the level of chance) where part-words / ngrams gave a decent explanation for Voynichese’s properties – but all the same, please let me know how you get on!

    Cheers, ….Nick Pelling….

  5. Nick, thanks for the reply. To clarify what I meant: the Voynich “words” each equate to pieces of plaintext words. The pieces are concatenated together to make the complete plaintext word. Voynich word-final “9” signifies the end of the plaintext word group.

    If you group the Voynich words in this way, then the stats for the resulting plaintext word lengths look more realistic (but with a long tail, for which there could be several explanations). I am currently having fun exploring this theory with a GA, and will be sure to let you know when the manuscript has yielded its secrets to this approach 🙂

  6. Hi Julian,

    One of the places where this kind of (actually very sensible) explanation falls a bit short is in Q13, home to the famous “qokedy qokedy dal qokedy qokedy”. There’s something a bit artificial going on there, however you look at it 😮

    Cheers, ….Nick Pelling….

  7. Hi Nick,

    Take the hypothetical mapping

    qokedy => “f”
    dal => “ra”

    so “qokedy qokedy dal qokedy qokedy” could be part of the word “riffraff”. I think the GC transcription for that VMs sequence is “4ohc89 4ohc89 8ae 4ohc79 4ohc79” which allows more possibilities. I tend to work only with GC, but my feeling is that it is over zealous – probably those 7s are really 8s.

    Julian

  8. Fastercat on June 30, 2009 at 12:48 pm said:

    “qokedy qokedy dal qokedy qokedy”.
    It could also be something simple like a Bacon Cipher version of ‘e’ AABAA. Time frame is wrong, but doesn’t necessarily mean the method is incorrect.

  9. As I understand it, the whole point of the Bacon biliteral cipher was steganographic – to embed it inside natural-looking text to hide its presence. So why embed it in something as artificial as “qokedy qokedy dal qokedy qokedy”?

  10. Fastercat on June 30, 2009 at 5:35 pm said:

    I was reading the article about the Currier A/B and was thinking to myself that perhaps there are additional messages hidden. The text is a carrier for multiple streams of information.

    You do have a good point though.

  11. Diane on May 23, 2010 at 7:08 pm said:

    I think the point about the imbalance between odds and evens is interesting – and conjured up an interesting image viz. someone with a book that was to be read in the way that Hebrew and many Asian languages are. So the ‘evens/reverses’ would actually be our Odds/recto, and the handful of complementary references might suggest commentary, or imagery, facing.

  12. Has anyone thought or looked into it as a mathmatical cipher as far as what I am reading and hearing it seems that this would be a very plausable hypothesis

  13. CRS: the reason that many linguists have claimed that Voynichese is a language (rather than a cipher) is that it exhibits many language-like features – lots of structure. At the same time, the point of mathematical ciphers is to remove structure from a message, which should (in theory) make it hard for code-breakers to get started. The paradox of the Voynich, then, is that while it looks simple and well-structured, it is apparently as hard to break as mathematical ciphers. However, that isn’t to say that it is a mathematical cipher, just that it is as hard to break as one!

  14. Has any one allied the washerwomens names to flowers in the voynicht codes then use the translations to decode the text,We have our roses and lilys,and most languages name their women after nice thing, Sorry if im a nuisance.

  15. Stanley: people have been looking for “cribs” in all sections of the Voynich Manuscript for decades, yet with no success at all. If only it was that simple!

  16. I’ve always thought that the Voynich script was very counter-intuitive for someone using a quill pen. (try it). Even with a metal nib, it is near- impossible to write fluently, from left to right, without either dragging the pen contrary to the norm. So many of the forms, if you try to form them fluently, just open the ‘nib’ and it is evident that to avoid doing this, the scribe had to constantly use two or more strokes for a single letter, and sometimes as many as three or four. Thoug obviously accustomed to the script, he’s not writing in the way that western alpabets usually work. I’m not talking about ornamental letters. But why use letter-forms which are necessarily slower, unless you are copying some original written with a different implement?

  17. Diane: it’s a tricky one – for me, I’ve long thought that this was an alphabet that was designed less for a quill than for a biro (just kidding, I mean a stylus). As such, I’ve always seen its production as a two-stage writing process – the author/encipherer first writing it on wax tablets, and then a scribe copying out onto the page.

  18. Diane on April 7, 2013 at 4:16 am said:

    Nick
    About “Neal keys” again – I’d like to refer to one of your posts about them – have you any preference?

  19. HI, i am searching for person who is dedicate to study this book and open for a discussion , can anyone here could provide the contact of such a person , it will be really great full.

    thanx.

  20. oracle: what do you want to know? I’ve been studying and writing about the Voynich Manuscript for over a decade. 🙂

  21. xplor on May 26, 2013 at 9:30 pm said:

    I see it as the lecture notes of a humanist teacher of natural science that were made into book form. Maybe they had hopes of preserving his work and were never able to read what it said.

  22. thomas spande on May 28, 2013 at 9:15 pm said:

    Nick, line 2, fourth word, has (I think) a scribal abbreviation that appears as “n-right paren” and looks in the VM like “i-tipped?”. I think that particular word could be “ean[…]” where “8”=e, “a”=a and the right paren is scribal notation for a concealed letter, maybe for example, “t” ? Cheers, Tom

  23. Where are the dots in the pictures of the women coming from? They appear to be very red… are they original? Has anyone considered invisible ink in the main text that may have deteriorated over time?

  24. James: the nymphs’ red cheeks are almost certainly original – are these what you’re talking about?

    Invisible ink: people have looked at the VMs through all kinds of cameras, filters and lenses over the years, and there is relatively little chance that something like invisible ink would have been overlooked. But you never know… 😐

  25. Is there evidence of over writing?
    Like finding a crossword puzzle in a doctors office that has had several people trying to solve it.

  26. Prescott Currier thought the Voynich had at least two authors. Author A wrote most of the Herbal pages and Author B wrote a lot of Herbal pages . Who was first A or B? Is it possible that there were two or more books cobbled together by appearance ? What would be the optimal size to decode any single section? Are the water slides section (balneological) all written by author B ? Do the gallows’ characters represent whole words ?

  27. Menno Knul on October 10, 2014 at 10:43 pm said:

    Dear xplor,

    As for the gallows I am pretty sure, that they do not represent whole words, but categories K,T,P,F and , , , , given in order of frequency. So we deal with eigth categories. If you list the K, T, P, F markers, you will find a great regularity with respect to the prefixes, e.g. o-, qo-. As far as I can see now, the prefixes consist of numbers, similar to the numbers in the quire indications. My analysis is close to be published.

  28. Nick,
    Just revisiting your comment of February 5th., 2012, it struck me that if an alphabet is meant to be transcribed fluently in one direction, then whether with stylus or pen, the stroke direction and order should be the easiest one. Voynichese isn’t – so while I like the idea of the stylus, it doesn’t solve the basic problem of Voynichese script, does it?

    By the way, may I offer my sincere respects for your having maintained this blog and its comments over so many years. Positively herculean effort – thank you.

  29. Dear Menno,
    Interesting, the ‘eighths’ business. I’ve recently suggested somewhere on Nick’s blog that the ‘gallows’ represent signs for a common place/direction. Eight divisions for the horizon were the norm in earlier times, but if the mark is for (say) a given route or specific place the options are broader. I’m thinking of course in the context of this work as a traders’ handbook, from the days when the trader might also own his ships and his associates or family practice a particular craft.

  30. Glen Caston counted 17 , maybe 18 of these gallows’ characters and there are 19 gallows characters in Glen Caston’s Voynich 101 transcription. There must be a history of people using them. Why would you write in this way?

  31. Menno Knul on October 11, 2014 at 6:51 pm said:

    Diane, Xplor,

    Unfortunately this web site did not render certain signs. So I try again. The eight gallows are in order of frequency:

    K – T – P – F and what I call ligatures cKh – cTh – cPh – cFh.

    VIB does not render the gallows with capitals.

    Diane, I noticed your earlier remark, that gallows without prefix might indicate geographical entities, but you must be aware that these gallows occur as well with a number of prefixes like o- and qo-.

    Xplor, if you count all of the writing variants, you may get a higher number of gallows. I rather keep the EVA transcription as presented in the VIB. Life is difficult enough.

    Greetings, Menno

  32. ‘Don’t the gallows’ characters kind of jump out at you, sort of loud and proud ? Where have you seen that before? Is there another manuscript that has gallows’ characters ?

  33. Menno, I was thinking of the Neal keys, not every use of the gallows. Envisioning the series deriving from a basic eightfold system is, to my mind, more easily compatible with the system of directions, so I wondered whether the forms occurring as Neal keys – if organised into a circle rather than the usual table – might not permit a more accurate rendering of phonetic values than our current transcriptions. Not knowing whether any such compass would refer to the Mediterranean wind-rose (Levante etc.), the magnetic notation (S, SW, SE etc.) or the eastern sidereal rose(s) the process would be tricky, but at the very least could provide a new pastime for the puzzlers.

    Cheers.

  34. Xplor,

    I guess nobody has seen such gallows’ characters before. That makes the VMS unique.
    The best we can do is to find the system behind the use of gallows. Half of the VMS vocabulary consists of words with gallows K,T,P,F and cKh, cTh, cPh, cFh. So they play an important role, at least on a visual level.

    Greetings, Menno

  35. Menno,
    If you tell me that “Half of the VMS vocabulary consists of words with gallows K,T,P,F and cKh, cTh, cPh, cFh”, my first question would be “What is the role of the c-h construct?” Does it “negate” the K, T, P and F? Does it “reverse” them? Is it a tricky (polyalphabetic) way of writing K, T, P and F in a disguised manner? Is c-h employed with non-gallows characters? If so, in what way?

  36. mark on May 11, 2015 at 9:16 pm said:

    Take the EVA alphabet and make a drawing with all the characters; make a square and on each face of the square make an over sized T and capital T ,K capital K. In the four corners, put the diagonal F’s and P’s.Inside the square draw the 4 figure 8;s (NSEW?) Then inside the over sized T’s and K’s draw in the four o-e-h-a and s-i-y-v, one set under the gallows and the other set out.Finally, around the 8 points draw in r-l-n-q-u-z-b-apostrophe;
    Notice how oeha siyv looks like octa-sign(a)?.

  37. Tricia on May 12, 2015 at 1:35 am said:

    We need a Voynich cube.

  38. mark on May 12, 2015 at 4:28 pm said:

    I can post a picture, I know it sounds confusing but it’s very logical

  39. mark on May 12, 2015 at 5:00 pm said:

    If I am on the right track, then these could be astronomical sightings disguised as writing. The planets, sun and moon all lie on the same elliptical plane, but are constantly changing their order. You would need symbols for “retrograde” ” opposition”
    etc. Like Mayan glyphs, they could also be building blocks for words.

  40. D.N. O'Donovan on May 12, 2015 at 8:17 pm said:

    Mark, it’s a very exciting scenario, but I’m at a loss to imagine where it would fit in the world of the fifteenth century. We have a diagram drawn of planetary motions that is a bit off, but it was made in no later than the eleventh century, and if you check your history of astronomy,… well, why should anyone encode information as common as that?

  41. Diane: when so many other people have proposed astronomical Voynich theories, why not just point the guy at some of them instead of snarking at the whole 15th century? e.g.
    * P. Han’s theory: http://www.voynichmanuscript.co.uk
    * Robert Teague’s theories: https://voynichology.wordpress.com/
    * Wayne Herschel’s theory: http://www.thehiddenrecords.com/grail.php (warning: you may find the colours a bit overbright)
    * Marco Ponzi’s theory: https://stephenbax.net/?p=803 (on the Baxxxx’s site, alas)

  42. D.N. O'Donovan on May 13, 2015 at 1:31 am said:

    Good point.

  43. mark on May 13, 2015 at 10:10 pm said:

    OK, then here is some plain English!
    From page 245 of “The Voynich Manuscript” by Gerry Kennedy and Rob Churchill, they have an example provided by G Landini of folio 1r in EVA letters.
    write the text from the beginning on the first line and start from the end on the second line making lines roughly 5 words long.
    line 1
    fachys
    cfhaiin
    cTHrEs
    or-okan
    or-Y
    otEOR
    cTHAR
    cETHes
    or
    The Theory Earth…
    line 5
    chTar
    ROlOTy
    ARE
    DaraiiN
    Sheky
    Rotates around…
    I had to guess one of the “a’s” was a “u”
    I guess I like this better than Astronomic observations!

  44. I hadn’t read this before. This is a splendid post, Nick! Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

Post navigation