USC’s irrepressible Kevin Knight and Dartmouth College Neukom Fellow Sravana Reddy will be giving a talk at Stanford on 13th March 2013 entitled “What We Know About the Voynich Manuscript“. Errm… which does sound uncannily like the (2010/2011) paper by the same two people called, errrm, let me see now, ah yes, “What We Know About the Voynich Manuscript“.

Obviously, it’s a title they like. 🙂

As I said to Klaus Schmeh at the Voynich pub meet (more on that another time), what really annoys me when statisticians apply their box of analytical tricks to the Voynich is that they almost always assume that whatever transcription they have to hand will be good enough. However, I strongly believe that the biggest problem we face precedes cryptanalysis – in short, we can’t yet parse what we’re seeing well enough to run genuinely useful statistical tests. That is, not only am I doubtful of the transcriptions themselves, I’m also very doubtful about how people sequentially step through them, assuming that the order they see in the transcription of the ciphertext is precisely the same order used in the plaintext.

So, it’s not even as if I’m particularly critical of the fact that Knight and Reddy are relying on an unbelievably outdated and clunky transcription (which they certainly were in 2010/2011), because my point would still stand regardless of whichever transcription they were using.

In fact, I’d say that the single biggest wall of naivety I run into when trying to discuss Voynichese with people who really should know better, is that hardly anyone grasps that the presence of steganography in the cipher system mix would throw a spanner (if not a whole box of spanners) in pretty much any neatly-constructed analytical machinery. Mis-parsing the text, whether in the transcription (of the shapes) and/or in the serialization (of the order of the instances), is a mistake you may well not be able to subsequently undo, however smart you are. You’re kind of folding outer noise into the inner signal, irrevocably mixing the covertext into the ciphertext.

Doubtless plenty of clever people are reading this and thinking that they’re far too smart to fall into such a simple trap, and that the devious stats genies they’ve relied on their whole professional lives will be able to fix up any such problem. Well, perhaps if I listed a whole load of places where I’m pretty sure I can see this happening, you’ll see the extent of the challenge you face when trying to parse Voynichese. Here goes…

(1) Space transposition cipher

Knight and Reddy are far from the first people to try to analyze Voynichese word lengths. However, this assumes that all spaces are genuine – that we’re looking at what modern cryptogram solvers call an “aristocrat” cipher (i.e. with genuine word divisions) rather than a “patristocrat” (with no useful word divisions). But what if some spaces are genuine and some are not? I’ve presented a fair amount of evidence in the past that at least some Voynichese spaces are fake, and so I doubt the universal validity and usefulness of just about every aggregate word-size statistical test performed to date.

Moreover, even if most of them are genuine, how wide does a ciphertext space have to be to constitute a plaintext space? And how should you parse multiple-i blocks or multiple-e blocks, vis-a-vis word lengths? It’s a really contentious area; and so ‘just assuming’ that the transcription you have to hand will be good enough for your purposes is actually far too hopeful. Really, you need to be rather more skeptical about what you’re dealing with if you are to end up with valid results.

(2) Deceptive first letters / vertical Neal keys

At the Voynich pub meet, Philip Neal announced an extremely neat result that I hadn’t previously noticed or heard of: that Voynichese words where the second letter is EVA ‘y’ (i.e. ‘9’) predominantly appear as the first word of a line. EVA ‘y’ occurs very often word-final, reasonably often word-initial (most notably in labels), but only rarely in the middle of a word, which makes this a troublesome result to account for in terms of straightforward ciphers.

And yet it sits extremely comfortably with the idea that the first letter of a line may be serving some other purpose – perhaps a null character, or (as both Philip and I have speculated, though admittedly he remains far less convinced than I am) a ‘vertical key’, i.e. a set of letters transposed from elsewhere in the line, paragraph or page, and moved there to remove “tells” from inside the main flow of the text.

(3) Horizontal Neal keys

Another very hard-to-explain observation that Philip Neal made some years ago is that many paragraphs contain a pair of matching gallows (typically single-leg gallows) about 2/3rds of the way across their topmost line: and that the Voynichese text between the pair often presents unusual patterns / characteristics. In fact, I’d suggest that “long” (stretched-out) single-leg gallows or “split” (extended) double-leg gallows could well be “cipher fossils”, other ways to delimit blocks of characters that were tried out in an early stage of the enciphering process, before the encipherer settled on the (far less visually obvious) trick of using pairs of single-leg gallows instead.

Incidentally, my strong suspicion remains that both horizontal and vertical Neal keys are the first “bundling-up” half of an on-page transposition cipher mechanism, and that the other “unbundling” half is formed by the double-leg gallows (EVA ‘t’ and ‘k’). That is to say, that tell-tale letters get moved from the text into horizontal and vertical key sequences, and replaced by EVA ‘t’ (probably horizontal key) or EVA ‘k’ (probably vertical key). I don’t claim to understand it 100%, but that would seem to be a pretty good stab at explaining at least some of the systematic oddness (such as “qokedy qokedy dal qokedy qokedy” etc) we do see.

Regardless of whether or not my hunch about this is right, transposition ciphers of precisely this kind of trickiness were loosely described by Alberti in his 1465 book (as part of his overall “literature review”), and I would argue that these ‘key’ sequences so closely resemble some kind of non-obvious transposition that you ignore them at your peril. Particularly if you’re running stats tests.

(4) Numbers hidden in aiv / aiiv / aiiiv scribal flourishes

This is a neat bit of Herbal-A steganography I noted in my 2006 book, which would require better scans to test properly (one day, one day). But if I’m right (and the actual value encoded in an ai[i][i]v group is entirely held in the scribal flourish of the ‘v’ (EVA ‘n’) at the end), then all the real content has been discarded during the transcription, and no amount of statistical processing will ever get that back, sorry. 🙁

(5) Continuation punctuation at end of line

As I noted last year, the use of the double-hyphen as a continuation punctuation character at the end of a line predated Gutenberg, and in fact was in use in the 13th century in France and much earlier in Hebrew manuscripts. And so there would seem to be ample reason to at least suspect that the EVA ‘am’ group we see at line-ends may well encipher such a double-hyphen. Yet even so, people continue to feed these line-ending curios into their stats, as if they were just the same as any other character. Maybe they are, but… maybe they aren’t.

Incidentally, if you analyze the average length of words in both Voynichese and printed works relative to their position on the line, you’ll find (as Elmar Vogt did) that the first word in a line is often slightly longer than other. There is a simple explanation for this in printed books: that short words can often be squeezed onto the end of the preceding line.

(6) Shorthand tokens – abbrevation, truncation

Personally, I’ve long suspected that several Voynichese glyphs encipher the equivalent of scribal shorthand marks: in particular, that mid-word ‘8’ enciphers contraction (‘contractio’) and word-final ‘9’ enciphers truncation (‘truncatio’) [though ‘8’ and ‘9’ in other positions very likely have other meanings]. I think it’s extraordinarily hard to account for the way that mid-word ‘8’ and word-final ‘9’ work in terms of normal letters: and so I believe the presence of shorthand to be a very pragmatic hypothesis to help explain what’s going on with these glyphs.

But if I’m even slightly right, this would be an entirely different category of plaintext from that which researchers such as Knight and Reddy have focused upon most… hence many of their working assumptions (as evidenced by the discussion in the 2010/2011 paper) would be just wrong.

(7) Verbose cipher

I’ve also long believed that many pairs of Voynichese letters (al / ol / ar / or / ee / eee / ch, plus also o+gallows and y-gallows pairs) encipher a single plaintext letter. This is a cipher hack that recurs in many 15th century ciphers I’ve seen (and so is completely in accord with the radiocarbon dating), but which would throw a very large spanner both in vowel-consonant search algorithm and in Hidden Markov Models (HMMs), both of which almost always rely on a flat (and ‘stateful’) input text to produce meaningful results. If these kinds of assumptions fail to be true, the usefulness of many such clever anaytical tools falls painfully close to zero.

(8) Word-initial ‘4o’

Since writing my book, I’ve become reasonably convinced that the common ‘4o’ [EVA ‘qo’] pair may well be nothing more complex than a steganographic way of writing ‘lo’ (i.e. ‘the’ in Italian), and then concealing its (often cryptologically tell-tale) presence by eliding it with the start of the following word. Hence ‘qokedy’ would actually be an elided version of “qo kedy”.

Moreover, I’m pretty sure that the shape “4o” was used as a shorthand sign for “quaestio” in 14th century Italian legal documents, before being appropriated by a fair few 15th century northern Italian ciphers (a category into which I happen to believe the Voynich falls). If even some of this is right, then we’re facing not just substitution ciphers, but also a mix of steganography and space transposition ciphers, all of which serves to make modern pure statistical analysis far less fruitful a toolbox than it would otherwise be for straightforward ciphers.

* * * * * * *

Personally, when I give talks, I always genuinely like to get interesting questions from the audience (rather than “hey dude, do you, like, think aliens wrote the Voynich?”, yet again, *sigh*). So if anyone reading this is going along to Knight & Reddy’s talk at Stanford and feels the urge to heckle ask interesting questions that get to the heart of what they’ve been doing, you might consider asking them things along the general lines of:

* what transcription they are using, and how reliable they think it is?
* whether they consider spaces to be consistently reliable, and/or if they worry about how to parse half-spaces?
* whether they’ve tested different hypotheses for irregularities with the first word on each line?
* whether they believe there is any evidence for or against the presence of transposition within a page or a paragraph?
* whether they have compared it not just with abjad and vowel-less texts, but also with Quattrocento scribally abbreviated texts?
* whether they have looked for steganography, and have tried to adapt their tests around different steganographic hypotheses?
* whether they have tried to model common letter pairs as composite tokens?

I wonder how Knight and Reddy would respond if they were asked any of the above? Maybe we’ll get to find out… 😉

Or you could just ask them if aliens wrote it, I’m sure they’ve got a good answer prepared for that by now. 🙂

41 thoughts on “Knight & Reddy on the Voynich, & the limits of statistical analysis…

  1. Hi Nick
    I’ve been looking for stats on two things – if you or anyone here can offer a link, I’d be glad.

    (i) maximum line-length for the text and also if that’s possible, any evidence of standard margin-widths.

    (ii) details of variation in spaces between lines. I’m wondering if space left between each 5th and 6th line (or 6th and 7th) might be consistently wider or narrower than the rest.


  2. Ivan Y on March 10, 2013 at 7:06 am said:

    TL;DR version of your very thorough post: garbage in, garbage out.

  3. Ivan: actually, the post was more about the opposite side of the coin – that if you don’t want garbage out, be very choosy about the garbage you bring in. 🙂

  4. Diane: maximum line length very much depends on how you join the EVA strokes together to make actual letters, so it’s a fairly subjective measure. And I don’t think anyone has measured things in a way that will give you the second datum you’re looking for – Philip Neal mentioned having added some additional metadata to the copy of the transcription he was working with, but I don’t recall those metadata including physical measurements.

  5. It would be easier if a scale appeared in the Beinecke zoom, wouldn’t it?

    Yes, when I said line-length, I did mean in mm. or inches.

    Suppose I could try a letter to the Beinecke..


  6. I would go, but I am on the other side of the country, in New York. There are a couple of Voynicheros in California, though, so I’ll mention it elsewhere. Represent.

  7. bdid1dr on March 10, 2013 at 10:06 pm said:

    Nick, I wish I could attend Knight and Reddy’s event at Stanford (I live not so far away). Howsomever, because of my dependency on reading lips (and more often than not ending up at the back of the auditorium) I’m somewhat useless at live performances/lectures. I do have an acquaintance at UC Santa Cruz who might be interested in their talk. I’ll try to contact her by phone after I finish this note.

    As far as the use of various size “4” or “9” figures as cipher, here is my translation of at least those figures as well as the use of the “&” here is my use in translating whole discussions:

    8 = aes
    & = aes

    9 = g or k

    tiny 9 which loop crosses to back of the down-stroke = ex

    c = c

    smaller c (which sometimes has a bar attached) = e

    The elaborate large curlicued “P” represents “B” or “P”, capital B or P which can also imply full words such as “Pliny”, “Botticelli”, “Prescription”. One can also “tack onto” that large initial P various other curlicues which gives you such words as Especially, or “Especies”.

    The difference in the alphabet letters “R” and “S” is very small: If the letter looks like a question mark without the dot, it is the Volsci/Cyrillic sibilant capital “C”. If the letter looks like a backward capital S, it is the letter “r”.

    The latin letters “U” and “V” are represented by “eo” or “oe” in context to what word is being expressed:

    Ennyway, what I have discovered lately, is that Athanasius Kircher may have been somewhat “Latin-impaired” when he was identifying various elements of the Alban Hills and Lakes”.

    Still fun. I hope you will “dwell upon” my alpha-linguistic-cipher notes for a little while at least — and compare them with your EVA and D’Imperio’s nonsense. Oh yes, there’s a word for you: “dwell”. Several weeks ago, I tried to demonstrate how one could stretch that “ell” glyph (you called it a “bracket”) and insert it into another word in that sentence.

    The “loopy double l” phoneme has manifold uses, in that other syllables can be tacked onto either side without having to add any other vowels. Same thing goes for the “tl”.

    Example: xmpl (“ex” is represented by that tiny “9”, “m” is represented by what looks like a fish-hook with two points).

    n e oea ave fun decoding y’all !


  8. bdid1dr on March 10, 2013 at 10:36 pm said:

    In response to Diane’s query re line lengths and spacing:

    What I’ve found when it comes to the Water Lily folio and the crocus folio, so far, is that the discussion for each of the botanicals seems to first identify each specimen’s historical uses. Later discussion further down the “stem” appears to be brief references to historical or philosopher/mythologists works/legends.

    Folio 35r Crocus (saffron, and legend of Crocus & Smilax)

    Folio 11v : Possibly a mulberry fruit, which I am focusing on today because there are only 6 lines of discussion!



  9. Rich
    Most people interested in this manuscript read Nick’s blog, I expect an will share details here if they can and wish.

    Otherwise, I have friends close to the library who might spare some time if all else fails.

  10. I’ve been drafting an email to send to Kevin (we know each other from grad school), and figured I share the current draft:


    After having it in my pile of things to read, I finally had a chance to read through “What we know about the Voynich Manuscript” in detail, and thought I’d offer some friendly feedback/suggestions (basically, a cross between what I would have written as a reviewer and some info you may find of interest). Hope you find this useful. Any new findings in the talk you’re giving this week?

    • Choice of transcription alphabet. While I’m partial to Currier myself, decisions regarding whether Currier ‘X’ is a single character or some combination of ‘S’ and ‘F’, or that ‘G’ is a single character rather than ‘IE’ impact entropy calculations, word length distributions, HMM results, etc. While Currier justifies his choices in the paper you cite (, this is a point worth mentioning.
    • Transcriptions. While I’m not a huge fan of the European Voynich Alphabet (EVA) (, the interlinear transcription at is worth looking at for several reasons: a) it integrates multiple transcriptions including (but not limited to) those behind, b) translation from EVA to Currier is straightforward using ‘sed’ or a similar tool, and c) the comment blocks contain accumulated data on suggested plant IDs, etc. from various sources. Another transcription worth looking at is Glen Claston’s ( When it come to recording variants while transcribing an unknown script there is a tension between “splitters” and “groupers”, and Glen is a “splitter” (partly because he was pursuing Leonell Strong’s polyalphabetic theory, which involves changing alphabets as well as a cyclical shift sequence, and Glen believed that some of the variants signaled alphabet shifts). Also, IIRC, Glen’s was the first transcription done from Yale’s hi-res color scans.
    • With regard to “word” length distribution, a number of people (myself included) have suggested that some spaces are inserted according to some rule(s), at least partially explaining the correlation between word endings and beginnings pointed out by Currier. Slightly more than a fifth of the spaces in the Biological B pages in D’Imperio’s transcription, for instance, are a ‘9’ followed by a ‘4’, and there are a very small number of ’94’ digraphs without spaces (possibly scribal errors). Whether inserting spaces between some specified set of letter pairs in a plaintext in this way produces something more like a binomial distribution, as seen in the Mss, is a testable hypothesis, but I haven’t tried it yet.
    • With regard to HMM models of the Voynich Biological B text, you should take a look at Mary D’Imperio’s paper “An Application of PTAH to the Voynich Manuscript” (
    • You cite Gabriel Landini’s paper in your references, but I didn’t see a discussion in Section 5.3 or elsewhere discussion his findings re:long range correlations in the Mss. Also of interest in that regard is Mark Perakh’s paper applying the Letter Serial Correlation test to the Mss. (
    • An analytic tool I tried but didn’t write up (although I don’t know that it produced any deep insights) was applying algorithms for learning certain classes of regular languages from only positive instances to the vocabulary of Biological B “words”. In particular, I tried one of Angluin’s algorithms for learning a k-reversible regular language from positive examples, and Garcia & Vidal’s algorithm for learning the k-Testable Language in the Strict Sense family of regular languages ( I’m not sure if better algorithms for learning REs from only positive examples have been developed since.
    • You say in Section 5.1, “Notably, the text has very few repeated word bigrams or trigrams, which is surprising given that the unigram word entropy is comparable to other languages.” There is some possibility that multiple letters in Currier’s alphabet are actually the same character. For instance, if there is a “word” in the Biological B vocabulary containing an ‘S’, with high probability the same word occurs with ‘Z’ substituted for the ‘S’, suggesting they might actually be the same letter. It has been suggested that some of the “gallows” characters may be interchangeable (f57v, for instance has both ‘B’ and ‘V’ in the same position in the repetitions of the key-like sequence). Failing to combine such interchangeable letters (if they are in fact interchangeable) will obviously reduce the number of longer repeats.
    • A cipher hypothesis you don’t mention is what folks on the Voynich mailing list generally refer to as a “verbose (monoalphabetic) cipher”, in which some plaintext letters correspond to combinations of multiple Voynich characters. For instance, if u = ‘S’, r = ‘C89’, o = ‘4OF’, b = ‘CC89’, and s = ‘CC9’, then “ZC89/4OFCC89/4OFC89/4OFCC9” on line 13 of f77r would correspond to the word “uroboros” (I’m not suggesting that as a serious crib, but as an illustration of the idea). This type of cipher would explain the low h1 and h2 values. If this is the type of cipher involved, the trick is figuring out what the breakdown into combinations is — the “words” 4OEAM, OEAM, and EAM all occur: are they (4O-E-AM, OE-AM, and E-AM) or (4OE-AM, O-E-AM, and E-AM), etc? An attempt has been made to apply genetic search to this problem (
    • Nick Pelling offers some observations on your paper at
    • His blog also mentions a new paper on called “Probing the statistical properties of unknown texts: application to the Voynich Manuscript” (which I haven’t had a chance to read yet).
    • Have you accumulated a list of on-line machine readable corpora of 15th cent. texts in various likely languages? I’ve found for Middle English, but haven’t found a similar resource for e.g., early 15th century Latin herbal or alchemical mss.
    Hope this is useful feedback/info — don’t decipher the Voynich before I do :->


  11. It is somewhat simpler:
    Some weeks ago I had published statistic reports, based on EVA. For this I used my new N-gram software “ngraman”, where N can represent an arbitrarily high ordinal number.
    If this is applied on EVA (N=25), with spaces and other special characters ignored, you will get completely all lexical items (words, phrases, particles):
    BTW. The differences between A and B Currier languages were determined similarly
    (N = 14):

    And, that one who distrusts in my tool, should have a look at its application on modern literature (Joyce, Ulysses):

  12. Joachim: EVA was explicitly designed as a stroke-based (or, more accurately, a component-based transcription), which means that (for example) “ch” and “sh” are each transcribed as two separate partial characters, despite their plainly being a single character when actually written.

    EVA’s authors did this not to encourage people to run statistical tests on the EVA corpus, but to encourage researchers to try out different hypotheses about what the correct alphabet should be (i.e. once you’d combined groups of strokes such as ‘ch’ into a single token).

  13. bdid1dr on March 11, 2013 at 4:04 pm said:


    When many lines of commentary end with the “8” and “9” characters (especially in the botanicals) you are looking at the sentence ending (or nomenclature ending) aes ceus. Some instances of the same phrase are represented by only the tiny “9” = “ex” or “excuse” or “exeus”.

    Another botanical I’m getting ready to solve/interpret is what even Boenicke tentatively identifies as an “artichoke”. Nope, never seen an artichoke with dried stem top. B-u-u-t — maybe a tree fruit – mulberry? Morus Alba? “Silkworm food” (the leaves, anyway)? We’ll see. I’ll keep you posted.

    One item of interest, which I didn’t find in the botanicals or pharmaceutical sections, is the fruit of the mandragore. See Boenicke 408, folio 83v. The script which appears beneath the globes tells us that the watered down fruit juice can ease pain.

    So, Nick, my reason for posting all of this is so that maybe you or a “Voynichero” may be able to observe to the “Knight-Reddi” presentation at Stanford with a “newer point of view”. Too bad I can’t participate (I wouldn’t be able to get close enough to the podium to read the speaker’s lips).

  14. Nick, does that mean that you despise any statistical analysis based on EVA, not only mine?

    If one glyph is expressed by several roman letters, it means for my N-gram tool only that the N is to be incremented. (The internal comparison processes always remain the same). And the determined frequencies of di- and tri-grams may indicate then one single letter to choose from.

  15. Joachim: “despise” is the wrong word, “despair” is far closer.

    It’s just that when I saw your list of n-grams headed up by “ch” (with “sh” not far behind), it did make me feel as though I hadn’t really managed to get my central point across in a 2000-word post. Which is this: that the whole point of statistically analysing an EVA stroke transcription is to work out what the best non-stroke transcription is.

    The practical problem is that almost all Voynich researchers seem to have lost sight of this. 🙁

  16. Karl: thanks for cc’ing that here, hopefully someone will attend the lecture and let us know KK’s & SR’s responses…

  17. bdid1dr on March 11, 2013 at 8:06 pm said:

    Nick & friends,

    I’m hoping that “someone” will “consideringly” read my responses on this particular discussion page, and take a “laundry-list” of questions to be directed at Knight and Reddi (if K & R will even open a Q and A dialogue).

    I THINK I understand that you would like to keep the “Voynich Manuscript” a mystery for at least a few more years. So, my next adventure will be a visit to the historical museum in San Jose California. As part of my duties as Senior Records Clerk in the City Clerk’s Office, I indexed every item of the public records. I was also responsible for overseeing the yearly microfilming of those records (for safe storage). It is one reason why UC Berkeley has microfilm copies of the earliest missionary correspondence between “headquarters” in Spain and Portugal (and maybe Rome/Frascati).

    What is most aggravating to me, is that I never set eyes on either the manuscripts themselves, nor got to read the microfilm contents! Long story! But I am now considering contacting San Jose’s Historical Museum in order to schedule a viewing of the documents which are now in “climate controlled” preservation cases for scholarly review.

    I’ll do my best to see if there is any resemblance to Boenicke ms 408. (In the 1970’s the City of San Jose contracted with a professional translator, who was unable to do a meaningful translation. His reason was that the San Jose documents were written in a “clerical hand” used by the clerks of the Royal Courts.

    It may be months before I can write. Maybe this will give you “breathing room”?

    Cheers ! 🙂

  18. Nick: Partly I can understand you. You do not like ‘ch’ and ‘sh’ as an eye catcher. Therefore I have for you a N-gram analysis, starting with the longest possible terms (including word breaks):

    For codebreakers remains a problem: Where do we see text scrambling resulting from encryption when lexical items are ordered so natural language like?

  19. Joachim: EVA is a stroke-based interim transcription, a precursor to a glyph-based transcription. So when I see frequent strings starting “h…” (i.e. partway through ch / sh / cfh / cth / ckh, cph), I feel extremely uncomfortable making any sort of inference from them.

    In my opinion, the two questions that need considering most in order to solve the Voynich Manuscript are:-

    (1) How did the original author group EVA strokes togetehr to form complete letters? (And what evidence do we have to support whatever conclusion we draw about this in preference to other possible final transcriptions?)

    (2) What is the correct order / sequence for parsing letters in a word / line / paragraph / page? (And what evidence do we have to support our conclusions on this?)

  20. bdid1dr:
    A question to K&R:
    What is their opinion (if any) about:
    “Discussion and conjectures” by Prof. J. Stolfi
    (which agree completely with my own investigations):

    “..severe constraints on cryptological explanations”

    “Semitic languages such as Arabic, Hebrew, or Ethiopian could perhaps be transliterated into Voynichese, but not by any straightforward mapping.”

    Not straightforward, I agree, as I found different meanings for EVA-k, single letters for “ii”, “ch”, indefinite word separation etc.
    I can certainly prove that the Voynich is not encrypted, by N-gram observed distribution of the lexical items.

  21. Joachim: if you think the text isn’t written straightforwardly, then surely you are saying that it *has* been encrypted (i.e. in the sense of “hidden” or “concealed”)?

    Your n-gram test results indicate (correctly) that a lot of structure is present, which is precisely the kind of thing more modern cipher systems (such as polyalphabetic substitution) disrupt and destroy.

    However, there are still a thousand other ways to hide text that predate all that kind of “data-flattening” mathematical trickery, so I can see no obvious reason to eliminate “old-fashioned” encipherment just yet. 🙂

  22. Nick: Yes, that’s right, and I’m talking myself in my documentation (about Arabic) of an encryption stage. And I’m still somewhat away from deciphering VMs completely. The content seems kind of more cryptic than the language itself 🙂

    BTW. Did you see my pdf on language A and B (significant) differences?

  23. Joachim: no, I’m sorry to say that I haven’t seen your PDF on A/B language differences. Can you pass on a link to it, please?

  24. Nick: The link to a pdf where A – B differences
    are shown:

    (only the 14-grams, for A complete).

  25. Here’s a downloadable mpeg4 version of the Stanford lecture video if you don’t want to mess around with the streaming:

  26. (addendum: video file size is 233MB)

  27. Nick: after reading your article more carefully, I see an approach to clarify some things by statistical analysis.
    I speak of n-grams, language-independent analysis: identification of lexical items by their observed frequencies. So, I now can submit an N-gram EVA evaluation that assesses all particles within an expression even in all lower N orders. Important to know: When reading EVA, all spaces and special characters are ignored.
    The result here:

  28. bdid1dr on March 16, 2013 at 11:01 pm said:

    Nick and Friends,

    While you’ve all been focusing on Knight and Reddy, I’ve been cruising “Carmina Brigiensia” and “Carmina Burana”. I have found a perfect match for that “berry” which appears on Boenicke 408’s folio 11v: Wikipedia has a very good discussion of:

    Carmina Burana : “The Forest” – is an elaborate illustration, of which one feature stands out from all the rest. See for yourself, and you may be able to understand my translation of why that “mulberry” signifies so prominently, yet just as obscurely, in our VM-ystery folio 11V. It is all about the leaves of the white mulberry tree, which were “pabulumox”/fodder/food for the silkworm larvae until they began to spin their cocoons. In this case I guess I can pun that the proof is in the “pabulumox” (pudding?) 🙂

  29. bdid1dr on March 17, 2013 at 8:07 pm said:

    Happy St. Patrick’s Day! Though I am part Scots-Irish, I can still do the Irish Step Dances and reels (Google has it for their logo today).

    Ennyway, my take on Knight-Reddy’s presentation as I was able view online, pretty bad! (I read lips and body posture to get at least some clues to what is being discussed.)

    So y’all will just have to limp along without me — ahem!


    bdid1dr 🙂

  30. Diane on April 6, 2013 at 5:42 pm said:

    If one supposed that the text’s line length = paragraph length, then occurrences of that

    ” pair of matching gallows (typically single-leg gallows) about 2/3rds of the way across their topmost line”

    might serve the same purpose as the cartouche does in hieratic.

    whether it enclosed the name of a deity, person, animal, thing or something having aspects of several (as stars do) might be an additional complication, of course.


  31. Diane: the cartouche shape means “name” whereas I suspect Neal keys are more like a general-form enciphered medieval bracket pair. But apart from that, we’re signing from the same (antiphonal) songsheet. 🙂

  32. Diane on April 7, 2013 at 3:51 am said:

    Dear Nick,

    Thank you, I’m yet to be blessed with grandchildren.


  33. Diane on April 7, 2013 at 4:46 am said:


    I’ve just done what I should have done *before* offering the ‘cartouche’ comment.

    i.e. googled “Voynich” AND “cartouche” …

    so enervating.

  34. Diane on April 14, 2013 at 6:13 am said:

    I should put this into the forum, but replies there are rare.

    Can someone explain why curious groups such as dain, dain, qokedy dain (etc.) are treated as a function of encoding, rather than an indication of original language?

    I mean, why does dicussion not focus on languages which regularly show a pattern of a,a, b, a, a?

    Similarly with char-groups that appear only in certain positions?

    On the other side of the coin, why is ‘4o’ taken as a function of the language rather than a result of encipherment. (Is encipherment a word?), but the top-line gallows, reversely (I’m sure that’s not a word!).

    Illustration; suppose the ‘4o’ translates ‘and’. Sometimes it would appear in a word ‘candy’ but its appearance as itinial ‘And…..’ would reflect a usage natural to some languages and not others. (In English it is found in that position regularly only in translation, and mostly of Biblical texts or renderings from Aramaic.

    What if dain, dain, qokedy daiin was equivalent i~ in one or another language ~ to e.g. ‘Verily, verily, I-say-to-thee, with VerilyPlus… .[that shall such and such occur].

    So why does no-one seem to spend time on matching those patterns with various grammatical forms?

    This reminds me of in interesting paper written in 1995 by Clive Holes, ‘The Structure and Function of Parallelism and Repetition in Spoken Arabic: a sociolinguistic study’, Journal of Semitic Studies, Spring issue, pp.57-82 in my p/copy though the bibliography looks a bit short.
    On p.79 he said,

    I have tried to lustrate the variety of functions which patterned repetition – lexical, morphological, syntactic or combinations of them – can fulfill in the speech of non-literate Arabs… almost complete lack of the same features in speech of the younger generations in the same speech-communities.. some (older) speakers made such use of the repetitive devises.. and with such effect, that parts of the recording sound like a species of artistic performance.. discursive, paratactic, concrete, religious, committed. The ‘literate’ style … succinct, ratiocinative, abstract, distanced.

    But there are cases of interaction – I’ll cite my usual example of Majid’s treatise on navigational astronomy and method.


    interesting paper about internal assonance and rhythm in spoken colloquial Arabic:

  35. voynichimagery on August 8, 2013 at 2:33 am said:

    a substitution cipher from Spain which some thought might prove to be Voynichese.

  36. Not my area – so no comment except that this paper by William Pourquet may interest people working on Voynichese

    (On that site, the paper is available in pdf)

  37. Mark Knowles on July 17, 2017 at 2:26 pm said:

    I wonder about the possibility of the author not being able to read his own manuscript. Personally I doubt this possibility is the case nevertheless I thought I might raise it.

    It could be the case that in his fervour to the produce a watertight cipher he might have over done it and produced something so complex and intricate that he couldn’t read it back himself. It could be that in the process of writing it he piled new cipher rule upon new cipher rule and so drowning in rules and so confusing himself and failing to follow his own rules correctly. If he made it so complicated and possibly confused and muddled that he couldn’t read back his own manuscript it would help explain why we find reading it so difficult.

    This is just a thought and whilst it probably not the case it is certainly plausible I think. Naturally I hope this is not the case.

  38. Karl K. on August 3, 2017 at 3:01 am said:

    While this is an oldish post to be commenting on, there are some methodological issues it raises that have been on my mind recently and it seemed a more apropos post than some of Nick’s more recent methodology-related items.

    * Re: Nick’s “staging point” model for solving the Voynich Manuscript, EVA 2.0, etc. — working from both ends is a useful way to build a bridge (or a tunnel, although not a tower :->), and there are some “other end” issues that it’s critical to think about if you’re interested in solving the Voynich. For instance…

    * Nick asked in this post, “whether they have compared it not just with abjad and vowel-less texts, but also with Quattrocento scribally abbreviated texts?” That’s a really good question — and to answer it, you need a machine-readable corpus of 15th Cent. scribally abbreviated texts. In general, any research program with a serious shot at solving the Voynich will need machine readable corpora of various 15th century languages. Accumulating a list of resources for that should be a priority. U. Michigan has a good site with Middle English texts (URL in an earlier comment of mine above); several years back Rich SantaColoma mentioned, although that doesn’t have much for 15th Century.

    * Which leads into what I think is a key point re: methodology — if you have a cryptanalytic hypothesis regarding the Voynich Mss., *actually trying to decipher the Mss. should be the very last step in your research program*. There are two reasons for this: 1) Whatever tools you’ve developed to assist in the decryption, you need to know the distribution of results you get when you apply the tools to texts where you know the answer, otherwise you don’t know whether the result you get using the tools on the Voynich actually “rings the bell” as it were; and 2) it helps prune dead ends up front — if you think the Voynich is some kind of polyalphabetic cipher at the glyph level, then you should probably think about how to get such a cipher to generate a ciphertext with 2nd order entropy lower than natural language plaintexts before you invest an enormous amount of time in trying to crack the Mss.

    * Let me toss out what I suspect Nick will consider an extremely questionable proposition for discussion: It’s rational to assume (at lease initially) that “Neal keys”, “Grove words”, etc. are not critical to understanding the nature of the Voynich cipher (if cipher it is) — not because that is necessarily true, but because if it isn’t then the odds of figuring out how to decipher the Mss are slim to zero. Yes, there is an element of the old joke about the drunk looking for the keys he lost in the alley under a streetlight “because the light is so much better”. On the other hand there is probably value in testing and eliminating simpler hypotheses first (and doing so may shed light on what’s going on in the cipher if it is more complicated).

    * Nick writes, “I’ve also long believed that many pairs of Voynichese letters (al / ol / ar / or / ee / eee / ch, plus also o+gallows and y-gallows pairs) encipher a single plaintext letter.” — I share Nick’s views on this; examining the digram stats for Biological B suggests that the tall pole in pursuing this line of thought is figuring out how to move beyond the obvious A[ERMN] and O[ER] combos (sorry, I speak Currier) to figuring out how to break apart [FP][SZ]C*89 words.

    * In Nick’s “Voynichese Task #1: moving towards ‘EVA 2.0’…” post, he raises the problems associated with the legacy EVA interlinear transcription. He’s absolutely right that many of the transcriptions there were converted to EVA from older transcription alphabets and were based on lower quality data (the copyflo, etc) than later transcriptions, *but*…reproducibility is a key issue with transcription, and interlinears are a useful tool for estimating sections of text that may need a closer look. (An advantage more recent transcribers have over the FSG, etc., is the ability to compare a transcription rendered in a Voynichese font with the text in the images — Glen Claston did that when developing his transcription.)

    Enough semi-random musings for now…


  39. Karl: I indeed also don’t believe that a decryption assault need answer all the questions before pressing the big red Start button. But almost all of the most basic cryptanalytical tools – word length, frequency counts, contact tables, HMMs, etc – rely directly and irrevocably upon a preceding assumption about how that text should be parsed. And I therefore strongly believe that most analyses carried out to date are close to worthless.

    Transcriptions are both a blessing and a curse, because too many people use EVA ‘raw’, ie without pausing to think about parsing. In many ways, we still haven’t properly begun to analyse Voynichese. 🙁

  40. The importance of the hidden Markov results is grossly underestimated.

    It is yet another tool that may help in finding the right way of parsing the text.
    Note that the results that have been described by D’Imperio and by Reddy & Knight were not based on Eva but on the far less analytical Currier alphabet.

    Doing some iterative study with this tool can lead to two possible outcomes.

    – Either it is possible to come up with a parsing method that generates a text that shows vowels and consonants.

    – Or it isn’t.

    The way forward for both cases is quite different.

  41. Rene: I am certain that the algorithms that try to construct HMMs are easily confused by verbose cipher. It may look obvious to us that (for example) there is a high chance that EVA ol or EVA al should each function as a (parsed) token, but this is not something that would emerge naturally inside an HMM.

    The technical difficulty is that there is no distinction between verbose ciphertexts’ (unparsed) covertext properties and the same ciphertexts’ (parsed) syntactic and semantic properties. It’s a mess. 🙁

Leave a Reply

Your email address will not be published. Required fields are marked *

Post navigation