Back in 2006 when I wrote The Curse of the Voynich, I included in the book a whole lot of notes relating to the internal structure of ‘Voynichese’ (i.e. the language, dialect, or manner of writing/encipherment found in the Voynich Manuscript, whichever you happen to feel easiest which).

To be clear, I didn’t claim to have deciphered so much as a single letter: rather, I wanted to communicate the high-level view of Voynichese I had built up (not too far from that of Brigadier John Tiltman) as a collection of smaller ciphers, all artfully arranged into an elegant overall system.

The mystery of EVA d and EVA y

For example, I believed (and in fact still do believe, and for a whole constellation of reasons) that EVA -d- (word-middle) and EVA -y (word-final) are probably kinds of scribal abbreviations (e.g. contraction and truncation respectively): and that to successfully read Voynichese, we will ultimately need to reconstruct how its words are abbreviated.

At the same time, I believe that EVA d- and EVA -y (both word-initial) work differently again, i.e. that the same two letter-shapes are doing ‘double duty’, that they mean different things when placed in different parts of a word.

In Latin, the shorthand shape ‘9’ (the same as EVA y) behaves very similarly to this, insofar as it stands in for com-/con- when it appears word-initially, and for -us when it appears word-finally. This was still in (admittedly light) use in the mid-fifteenth century, so the idea that something could mean different things in different positions within words was still ‘in the air’, so to speak.

Really, what I was trying to do was understand how the Voynichese ‘engine’ worked: to not only identify the individual cogs and pinions (i.e. Tiltman’s smaller component ciphers) but to also move towards identifying how these meshed together to form not just a collection of adjacent tricks, but a coherent (if subtly overlapping) system.

The overall metaphor that seemed most productive to me was that of architecture: that the components that made up Voynichese were laid out not haphazardly, but had a kind of consistent conceptual organization to them, yielding what appeared to be rigid use-structures and language-like rules.

Yet at the same time, attempts to produce formal Voynichese grammars to capture these have proved unfruitful: even though thousands of statistical experiments seem to back up the overwhelming intuition that there’s something there if we could only see it, we remain blind to exactly what is going on.

Yes, It’s Unpigeonholeable

Some Voynich linguists try to argue against my view by claiming that I’m describing it purely as a cipher, which (in their view) ‘of course’ it simply isn’t. But the problem is that that’s really not my position at all.

Rather, one of my overall beliefs about Voynichese is that the person who constructed it would have been able to almost entirely (though perhaps not necessarily 100% completely) read it back off the page. And so a lot of what I’m talking about isn’t so much cryptography as steganography, “hiding in plain sight”: and that in turn isn’t so very far from being a linguistic problem.

So if (as I suspect) Voynichese turns out to be equal parts cryptography, steganography, shorthand and language, decoding it will require a significant collaborative effort: but it will also require people to stop trying to pigeonhole it into a single category. Is there any real likelihood it is pure language, or pure shorthand, or pure steganography? For me, the answer is no.

What many of us moderns forget is that the Renaissance (and particularly the fifteenth century) was a time long before the borders between intellectual specializations had started to be so anxiously patrolled. Back then, there was no hard line between language and cipher, between fact and fiction, between Arts and Sciences, even between past and present: thinking was far muddier, and far less clearly defined. Or, if you want to be charitable, much more fluid and creative. 🙂

And so I think we really shouldn’t be surprised if the creator of the Voynich manuscript trampled gleefully over the flower beds of what we now think of as convention: it would be several hundred years before intellectual “Keep Off The Grass” signs would start to appear.

Vowels, Consonants, Numbers, And, The

Regardless of all the above, I think that anyone trying to make sense of Voynichese really has to start with the most basic questions. Surely the biggest ones (and these have bugged me for nearly twenty years) are the classic questions of both cryptologists and linguists alike:

  • Where are the vowels?
  • Where are the consonants?
  • Where are the numbers?
  • Where are the ‘and‘ words?
  • Where are the ‘the‘ words?

Unfortunately, many people who go hunting for vowels in Voynichese take its letter shapes completely at face value: and by that token, EVA a / i / o would ‘surely’ be standing in for (plaintext) A / I / O. Even though this at first seems to move you forward, what immediately happens next is that you find yourself utterly, ineffably stuck: that even though “vowel = VOWEL” may (briefly) feel like a plausible starting point, Voynichese doesn’t actually work like that at all.

And so the more well-organized vowel hunters move on to applying linguistic algorithms (such as Sukhotin’s) to determine which letters are vowels, and which are consonants. This normally (e.g. depending on which transcription you are using, how you parse EVA letters into glyphs, etc) will yield much the same kind of result: which also gets you basically nowhere.

This also doesn’t even begin to attempt to answer the question of where the numbers are (for in a manuscript that size, there must surely be numbers aplenty in there, right?); where the ‘and‘ words are hiding; and just as much where all the ‘the‘ definite articles are to be found.

Honestly, how is it that researchers can collectively invest so much time staring at Voynichese and yet they almost all never try to formulate answers (however hypothetical or speculative) to such basic questions?

Shape Families

Despite our continuing inability to read Voynichese, I think we can identify – purely from their shapes and the similar ways they appear – a number of distinct groups of letters:

  • EVA e, ee, eee, ch, sh  (the ‘c-family’)
  • EVA t, k, f, p (the ‘gallows family’)
  • EVA or, ar, ol, al
  • EVA an, ain, aiin, aiiin
  • EVA air, air, am, aim
  • EVA d, y
  • EVA qo
  • EVA s

Oddly, many of the shapes inside each of these groups can often be substituted for one another (e.g. gallows can normally be substituted one for the other to form similar words): and this alone forms a kind of skeletal “shape-grammar” for Voynichese. (Though quite why this should be the case remains a mystery.)

One of the things I have long wondered about these shape families (which, once again, wasn’t not far at all from what Brigadier Tiltman had suggested) was whether each of them might have previously expressed some kind of individual cipher-like trick: for example, I wondered whether the ololol-like repeats of the or/ar/ol/al group might have originally been specifically used to disguise Roman numbers.

In which case Voynichese wasn’t itself a work of invention so much as one of careful assembly, its creator stitching (and adapting) a set of pre-existing tricks together to form the illusion of a coherent whole.

In which case, the intriguing question then arises as to whether we might be able to reconstruct what each of these families is trying to conceal. Might we be able to work out the secret history of each of these sub-tricks?

On The Vowel Trail

All the same, the question of the day comes down to this: which of these distinct families might be hiding the vowels?

Back when I was writing Curse, I speculated whether the series of ‘c’-like shapes in Voynichese (EVA e, ee, eee, ch, sh) might somehow be standing in for vowels. After all, the members of this set do seem to share some kind of visual ‘family connection’ as far as their shapes go (i.e. they’re all formed of right-facing semicircles, and there are (superficially, at least) as many of them as the number of vowels you might typically expect to find in a typical European text (i.e. five).

A famous medieval monastic cipher also replaced vowels with clusters of dots (e.g. one dot for a, two dots for e, etc), so the idea that a cipher and/or alphabet might ‘thematically obfuscate’ a connected group of letters in the same way is visually (and indeed historically) quite appealing.

At the same time, I think that while this may well prove to be true (or even largely true) for Currier A pages, at the same time something odd is going on with Voynichese Currier B pages that this isn’t capturing. So Voynichese as a whole remains subtler and more awkward than this is able to completely account for.

Strike-Through Gallows

What I also find hugely intriguing is not that there are families of shapes, but that there are also mysterious areas of overlap between those families.

These are the places where I think the creator of Voynichese used his cunning to ‘hybridize’ them, i.e. to adapt the area between a pair of families, to turn the overall set of families into a complete system.

Nowhere is this kind of overlapping clearer than with the strike-through gallows. These are instances where shapes in the gallows family (EVA t, k, f, p) are kind of ‘struck-through’ by a ‘ch’ shape. The difficulty of rendering these struck-through gallows as text led to a lot of debate between people proposing various Voynich transcription alphabets.

In the end, the EVA transcription rendered the ‘ch’ shape as two half-letters so that struck-through gallows could be rendered with a ‘c’ and an ‘h’ either side of it, e.g. EVA k -> ckh, t -> cth, f -> cfh, p -> cph. But remember that this is no more a handy transcription convention, and really shouldn’t be interpreted as endorsing any particular view of what is actually going on ‘under the hood’.

Because that’s another big question researchers have been all too content to avoid ever since EVA arrived: in short, what on earth is going on with these struck-through gallows?

Back when I wrote Curse, I pointed to a 1455 Milanese cipher where, very unusually, ‘subscriptio’ was rendered in a very similar strike-through way: and so proposed that this might well be what we are looking at with strike-through gallows. While this made good hypothetical sense at the time, I have to say it also didn’t really sit well with the idea that EVA ‘ch’ might be in some way part of a vowel family. And so I was left not seeing how these two families and their overlap might have been meshed together

But a couple of years ago, I had an idea as to how all these different pieces could have been reconciled into a single system…

Cicco Simonetta and Q

Philip Neal’s exemplary translation of Cicco Simonetta’s 1474 Regule (‘rules’) for codebreaking includes his translation of Simonetta’s notes on the weakness of the letter ‘Q’:

Consider if in the published writing there be any cipher which always and everywhere is followed on by one and the same cipher, for such a cipher is representative of q, and the other following is representative of u, for always after q follows u, and the cipher which follows on the cipher representative of u is a vowel always, for always after q follows u and another vowel follows after u.

What, then, are codemakers to do to avoid people using QU as a giveaway? Apart from adding in nulls, Simonetta suggests possibly “putting one sole letter in place of q and u”.

Now, what I found interesting about this is that in 1474 (actually, I strongly suspect that Simonetta was copying out a document that had been compiled some twenty years previously, so perhaps in 1454 or so), Milanese codebreakers were aware that leaving ‘q’ and ‘u’ adjacent was a crypto ‘tell’, that could be used to break their ciphers.

And yet in the Voynich Manuscript, there was apparently no sign of any mechanism or shape family being used to obfuscate a ‘qu’ pair. Or… was there?

Revisiting EVA ch

And so I finish this with the thought that struck me a couple of years ago. What if the strike-through gallows were simply formed by a ‘Q’ shape being struck through by a ‘U’ shape?

For if that were the case, we could probably conclude that not only is EVA ‘ch’ a vowel, but the letter it is standing in for is U/V.

Ah, some might say, but there are 18 instances of EVA ‘chch’ in the Voynich Manuscript. However, I would point out that many/all of these could very easily have been copying errors for the (almost microscopically different) EVA ‘chee’ (e.g. ‘dchchy’ could instead have been ‘dcheey’, etc).

Similarly, even though there are 755 instances of EVA ‘chee’ in the VMs, there are only 33 instances of EVA ‘eech’. Perhaps this is representative of words beginning ‘V’+vowel, or of specific diphthongs, I don’t know. There are 4989 ‘che’ instances, but only 180 ‘ech’ instances: maybe this is something that can be mined for more information and insight.

Of course, I don’t know that I’ve got this right: but the suggestion that EVA ‘ch’ is ‘U/V’ is a hypothesis that’s based on good observation and good crypto history, and offers plenty of space to explore and to work with.

For example, it would suggest that ckh is actually the same as (k)(ch), which may help normalize a lot of the text (and please don’t try to argue back to me that k ‘can only’ maps to a single plaintext letter, Voynichese is much too subtle for that, or else we wouldn’t get qokedy qokedy etc).

Lots to think about, anyway.

13 thoughts on “The Voynich Manuscript: trying to decipher EVA ‘ch’…

  1. This blog post is covering a rather vast area, and the best I can do is add just a few assorted comments.

    The most fundamental point to me seems certainly right: one has to forget the notion that a single symbol in the MS text should represent a single character of plain text, regardless whether one uses Currier, Eva, or anything in between.

    While it is good to know what *not* to do, we still don’t know what is the right thing to do. What’s almost even worse is that just about everyone who proposes solutions is just exactly the thing that we should not be doing.

    I don’t believe that we should be expanding Eva d’s and y’s into something longer. *If* the Voynich MS text is a rendition (cipher, steganography, what have you) of some plain text, I rather expect that the “expanding” took place in that process, and we should be “compressing”. This would mean that:
    – not all word spaces are real spaces
    – or perhaps there is no meaning to be recovered

    More later, most probably.

  2. Rene: as I’m sure you remember me writing, the whole idea of inserting / remving spaces to disrupt the sequencing of words in a ciphertext was first introduced early in the sixteenth century. As a result, the overwhelming probability here is that a Voynich word, whether we like it or not, maps to a plaintext word: and yet, problematically, the information content of a Voynichese word is very typically far too low to be a real word taken in isolation. For me, the only way I can see to resolve that paradox is by having abbreviation / truncation / contraction reduce the information content in words: and that’s the heart of what I think is going on.

  3. Nick: per Rene’s comment, abbreviation will increase the entropy per glyph (all the bits in “con” get packed into “9”), not decrease it. Not sure about truncation/contraction — unless they add more ambiguity than they remove redundancy, they may up entropy per glyph as well. As I’ve said before, one of the tall poles is lack of machine-readable corpora (in this case, transcribed mss preserving scribal abbreviations/contractions/etc.) to run tests on.

    Rene: My recollection is that previously you had held to the spaces-as-word-separator position — what changed your mind?

  4. Karl: adding extra glyphs to the alphabet will add a small amount of information per letter, sure. But I was talking about the amount of information per word, which will decrease as you shorten the length of words in all but the most artificial of scenarios.

  5. Hello Nick,
    fascinating post!
    To your comment above however: “the whole idea of inserting / remving spaces to disrupt the sequencing of words in a ciphertext was first introduced early in the sixteenth century.”
    As I showed in my post about the scinderatio fonorum, the idea of splitting words (and much more) actually goes back to the 7th Century.
    Admittedly, this is not properly “in ciphertexts”, but as part of the eccentric grammar devised by Virgilius Maro Grammaticus… but then as you say in your post, the boundaries may have overlapped between areas we now see as distinct.

  6. VViews: I think it’s reasonably safe to say that Virgilius was a clever nutter, who would surely have felt very much at home in the sprawling nonsense world of imagined Voynich theories – in some ways, he was the Jorge Luis Borges of his day. I’m not sure I’ve yet seen all twelve of his supposed types of Latin used in Voynich theories, but there can’t be many of them left. 😉

    Even though it’s true that poems hiding messages at the start of or end of lines came back into vogue in the 16th century, I don’t honestly think it’s particularly likely that Virgilius was much of an influence on the person/people who made the Voynich. All the same, he was a fascinating character, thanks very much for blogging about him. 😉

  7. Ich schreibe auch auf deutsch, damit es keine Missverständnise gibt.
    Wie Rene schon geschrieben hat, ist von einem Einzelzeichen Text wegzukommen.
    Nick hat es vor kurzem mit der Postkarte aus Italien bewiesen. Obwohl sehr wenig Text, in wenigen Stunden gelöst.

    Von EVA rate ich Persönlich ab. Wörter wie ( qokedy qokedy ) oder ( z. B. EVA k -> ckh, t -> cth, f -> cfh, p -> cph ) sind eher verwirrend, als hilfreich.
    Der Schritt vom Symbol zu EVA und dann ins Latein und noch in die eigene Sprache. Sind ein paar Schritte zu viel.
    Und was noch schlimmer ist, ich ignoriere das Gefühl ( das kommt mir vertraut vor ), ( oder schon mal gesehen ).

    Normalerweise würde ich hier nicht vorgreifen, bevor ich es nicht auf meiner Seite platziert habe.
    Alles was ich schreibe beruht auf Logik und sollte für jeden Nachvollziehbar sein.
    Der einzig wirkliche Hinweis auf den VM-Text bietet das Sternzeichen ( Taurus = Stier ).
    Taurus ist aber griechisch. Aber interessanterweise gibt es im latein auch den ( Taurum = Stier )
    Taurum bedeutet nicht irgend ein Stier, sondern genau dieser.
    Nehme ich das erste und letzte Zeichen bekomme ich ( tum ) möglich wäre auch ( tus ).
    Merke mir ( a, u, r ). Seltsam, mit diesen Buchstaben lässt sich das Wort neben den Rohren in der Rosette bilden.

    Jeder kennt das Auszählen von Buchstaben um ein Symbol festzulegen, welche Bedeutung es haben könnte. Das funktioniert aber auch mit Endungen. Und da steht ( tum ) ziemlich oben.

    Vorsilbe, Kombination, Einzelzeichen, Endung und Täuschung sind die Schlüssel zu dieser Tür.

  8. Robert Keller on February 13, 2019 at 12:35 am said:

    Nick: The glyph before EVA -y (word-final) is normally EVA -d- (word-middle). Do you believe that Voynichese are sequences of abbreviations?

  9. J.K. Petersen on February 13, 2019 at 4:05 am said:

    Nick Pelling wrote: “Rene: as I’m sure you remember me writing, the whole idea of inserting / remving spaces to disrupt the sequencing of words in a ciphertext was first introduced early in the sixteenth century. As a result, the overwhelming probability here is that a Voynich word, whether we like it or not, maps to a plaintext word: and yet, problematically, the information content of a Voynichese word is very typically far too low to be a real word taken in isolation.”

    I’ve mentioned this a few times, but perhaps it’s worth mentioning again… Cod. Pal. germ. 597 (Sammlung alchemischer) has two pages of text that have been carefully and deliberately broken into syllables.

    If it were a grammar book, this might be explained as an exercise or teaching tool, but it’s not, it’s a book of ciphered alchemical information with numerous sections that have been cut out (and some that have been crossed out). When found in this context, words with spaces added between syllables appear to me to be a step in pre-processing the text for encipherment. The content of the syllabized text includes dates and quite a bit of name-dropping and has the “feel” of something someone might want to encipher.

    The manuscript is dated 1426.

  10. It is impossible to mention even just the most relevant aspects of this wide topic. Generally, I do not feel certain about almost anything. As soon as one is ‘certain’ about something, one risks of becoming focused in the wrong direction.

    Indeed, I tend(ed) to consider the word spaces ‘real’. The main reason for that, and a very good one indeed, is that label words tend to occur in the text as whole words, i.e. separated by spaces. This was already argued many years ago in the original mailing list (as Karl may remember – he was already there when I joined). It was more recently confirmed with some statistics by Marco Ponzi.
    However, this may be the result of something else, for example that the labels are simply copies of ‘words’ in the running text. I’m not saying that I believe this, but it is possible. (And there are also reasons to think this).

    More in general, I think it is almost inevitable that whoever made the Voynich MS did something unusual, even original. So proposed explanations should not be rejected just because they include something unusual or original. At the same time, it should not be too farfetched. Specifically, I consider:
    – having some spaces that are not really word spaces
    – having abbreviated words
    are equally (un)likely or mildly unusual.

    Just to name an arbitrary number, one could make a list of 25 unusual statistics about the Voynich MS text. The very low bigram entropy is clearly one of the most conspicuous ones, but it is not the only one. No credible explanation for it exists to my knowledge. It is not sufficient to find explanations for one or the other feature. One has to explain everything.

    Example: the almost binomial word length distribution fits that of some Asian languages, but that does not explain at least 20 of the other points.

    Some ‘unusual’ statistics are actually not unusual by themselves. There are some normal values, but these are odd in view of the other unusual statistics.
    One example is the fact that the ‘information per word’, or the ‘variation of words ‘, is quite normal for a text of its length. This is expressed numerically as the word entropy. Given that the average word length is a bit on the short side, and character combinations are very restricted indeed, this is unexpected.
    It remains an open question if truncating words of an existing text will reduce this ‘information per word’ significantly or not.

    Most importantly, if the creation of the text involved several operations (e.g. several ciphers), then all bets are off. Each step may push each statistic in different directions, and in the end the total effect for each statistic may be lower, higher or similar, compared to the original text.

  11. Nikolai on April 3, 2019 at 7:51 pm said:

    There is a key to cipher the Voynich manuscript.
    The key to the cipher manuscript placed in the manuscript. It is placed throughout the text. Part of the key hints is placed on the sheet 14. With her help was able to translate a few dozen words that are completely relevant to the theme sections.
    The Voynich manuscript is not written with letters. It is written in signs. Characters replace the letters of the alphabet one of the ancient language. Moreover, in the text there are 2 levels of encryption. I figured out the key by which the first section could read the following words: hemp, wearing hemp; food, food (sheet 20 at the numbering on the Internet); to clean (gut), knowledge, perhaps the desire, to drink, sweet beverage (nectar), maturation (maturity), to consider, to believe (sheet 107); to drink; six; flourishing; increasing; intense; peas; sweet drink, nectar, etc. Is just the short words, 2-3 sign. To translate words with more than 2-3 characters requires knowledge of this ancient language. The fact that some symbols represent two letters. In the end, the word consisting of three characters can fit up to six letters. Three letters are superfluous. In the end, you need six characters to define the semantic word of three letters. Of course, without knowledge of this language make it very difficult even with a dictionary.
    And most important. In the manuscript there is information about “the Holy Grail”.
    If you are interested in this topic, I am ready to provide detailed information.
    Nikolai.

  12. Peter on April 4, 2019 at 10:44 am said:

    @Nikolai
    Why do you think pages 14 are so different from everyone else?
    Apart from the interesting combination
    “89 89 o8a8” = tum tum a tat

  13. You know, I’m slowly coming round to the idea that [d] can indicate a contraction. At least in the context of [-dy]. It seems to explain a number of difficult points. Though it needs much more research to understand how it works.

    As for [y] though, I’m not convinced that it would be useful or fit with what we know about the glyph.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Post navigation