You might think I’d be pleased by the appearance of another Voynich statistics study (Voynich Manuscript: word vectors and t-SNE visualization of some patterns), courtesy of those well-known peer-reviewed online journals Reddit and Hacker News. [*] After all, statistical experiments are – if carefully planned and executed – beyond all reproach, surely?

But there is a big problem (arguably a meta-problem) with this: and it’s one that’s been around for a very long time.

Even back in 1962, Elizebeth Friedman – having been a top US Government code-breaker for several decades – was able to note that all attempts to decrypt the Voynich Manuscript as if it were a simple language or single-substitution alphabet were “doomed to utter frustration”. That is, if you wind the clock back half a century from the present day, it was already clear then that Voynichese’s curious lack of flatness was strongly incompatible with:
* natural languages
* exotic languages
* lost languages
* monoalphabetic (simple) substitution ciphers, and even
* straightforward hoaxes

Unfortunately, the primary assumption of flatness is precisely the starting point of a large number of statistical studies carried out on the Voynichese text ever since.

Why Is Voynichese Not Flat?

A long succession of (actually pretty good) past statistical studies has revealed that Voynichese has an abundance of mechanisms that give it internal structure, not only in terms of letter adjacency and within words generally, but also within lines, paragraphs, and pages. Yet while all natural languages do work to plenty of orthographic rules, none of them (from this far back in time, at least) has orthographic conventions that extend so far into the high-level page layout.

In Voynichese, you can see these “supra-orthographic structures” in such places as:
* Horizontal Neal sequences (stereotypically manifesting themselves as pairs of single-leg gallows placed about two-thirds of the way along the topmost line of a paragraph or page
* Vertical Neal sequences (the first letter of each of a series of adjacent lines, forming a putative column of letters, and very probably distorting the agrregate statistics for the first character of each line)
* Vertical free-standing key-like sequences
* Substantial difference in word structure within “labels” (short pieces of free-floating text, typically inside or beside drawn features)
* Grove “titles” (small fragments of right-justified text tagged onto the end of paragraphs, e.g. on f1r)
* Small text_size:dictionary_size ratio
* Multiple repetitions of high frequency words (daiin daiin, qotedy qotedy, etc), etc

[Just about the only supra-word-level orthographic structure we can directly match is the change in frequency stats for the last letter of a line. In natural languages, we often see a hyphen placed there, while in Voynichese we often see EVA ‘m’ or ‘am’: so I would be unsurprised if these are essentially the same thing.]

Each of these features (which I’ve discussed in more detail elsewhere on this site) on its own would be annoying enough to account for, if (say) you were trying to reconcile Voynichese with a conventional language. However, put them all together and you suddenly get a glimpse of what we’re really dealing with here: something arbitrary, painfully complex, and extremely unlanguage-like.

If, as per almost all natural languages and ciphertexts, Voynichese did not have these features, we would happily describe it as “flat”, and it would be utterly fair and reasonable for people to throw their home-grown statistical toolkits at it in the reasonable expectation that something might just emerge from the process.

However, Voynichese is not flat: and so this kind of simple-minded approach is 99.9% certain to reveal nothing of any genuine novelty or insight. Sorry, but that’s just the way it is.

So, What’s The Answer, Nick?

If you want to do statistical analysis on the Voynich Manuscript that genuinely stands a chance of producing insightful and helpful results, you really need to put the Voynichese text through some kind of normalization filter before analysing it: by which I mean you need to condition the worst parts out.

The best starting point is to restrict your scope to one of the two large relatively homogeneous blocks of text:
* Quire 13 (but without labels, and without vertical sequences) – though note there is a long-unresolved suggestion that Q13 may have originally been composed in two parts / phases, not coincident with the final binding order
* Quire 20 (but without f116v) – though note there is also a long-unresolved suggestion that Q20 may have originally been composed in two parts / phases, and also not coincident with the final binding order.

Doing this should sidestep the thorny issues (a) of Currier A vs Currier B, (b) of text vs labels, and (c) of space transposition ciphers (because I don’t recall Q13 and Q20 having and “oro ror”-like sequences). [Personally, Q20 would be my preferred starting point.]

I would also strongly advise filtering out any matched pairs of single-leg gallows that fall on any single line, along with the (usually shortish) text sequence that sits between them: and any ornate gallows too.

All of which leaves the tricky issue of how best to normalize page-initial, paragraph-initial, and line-initial letters. The jury is still well and truly out on these: which probably means that evaluating them would be a good use of statistical analysis. Which also probably means that nobody is going to actually do it. 🙁

Finally: once you have got that far, all you’re left with is… the truly humungous issue of how best to parse Voynichese. Is EVA ‘ckh’ one letter, two letters, or three? Should EVA ‘qa-‘ and ‘qe-‘ always be interpreted as if they are copying errors for EVA ‘qo-‘? Should each of EVA ‘or’ / ‘ol’ / ‘ar’ / ‘al’ be read as a pair of letters or a single (tricky) verbose cipher glyph? Does ‘ok’ encipher a different token to ‘k’? Is ‘yk’ two letters or one composite one? And so forth… the list goes on (and it’s a very long list).

But unless you can find a way to see clearly past Voynichese’s supra-orthography, you’ll probably never get even remotely close to anything that interesting with your own Voynich statistics. Just so you know! 😐

[*] Tongue planted firmly and immovably in cheek.

69 thoughts on “Voynich statistics, and why Voynichese is not flat…

  1. Bobby D. on January 21, 2016 at 4:57 pm said:

    I’m not completely convinced of all the super-orthographic features (particularly the supposed vertical Neal sequences), but even if you leave that out I think the difficulty of parsing the text remains a substantial barrier, and I would agree that just looking at the options for statistical analysis – specifically, deciding what exactly you’re looking for, and your starting assumptions going into the analysis – are critical. I mean, if you look at the top-down options:

    Natural Language – The possibility is that it’s a natural language, either known or unknown, encrypted or unencrypted, gives four possibilities:

    Known language, unencrypted – If Voynichese is a known language – even a funky or idiosyncratic one like Latin shorthand or abbreviations – with the large sample of text available in the Voynich Manuscript, it should be susceptible to some basic frequency analysis and similar attacks. The fact that it isn’t suggests that if this is a known, unencrypted natural language, the script used to write it down is very weird indeed. While impossible to completely rule out, this is maybe the least likely options.

    Known language, encrypted – Voynichese, in its characters at least, shares some similarities to artificial alphabets used in various ciphers and codes in medieval Europe. So the possibility that it is an encrypted text from a known language like Latin, Italian, or Occitan starts to look pretty good…but if it is enciphered and/or encoded, it has some problems too. There are too many characters for it to be a simple substitution cipher, which suggests it’s at least a verbose cipher, and the weird distribution of some of the characters may even suggest a code book, or multiple ciphers or methods being employed.

    Unknown language, unencrypted – If the text is in an unknown language (whatever its origin) but is unencrypted, we’d expect it to possess some identifiable traits as an unencrypted text, i.e. no obviously deliberate efforts made conceal word or sentence structure, grammar, etc. For Voynichese, that’s a bit of an open question – it definitely looks like there are words separated by spaces, and some regular structures or patterns of distribution to the “words” and sentences and even pages – but if this is an unknown language, even one written in an unknown alphabet, it has some very funky distribution and rules. So, qualified maybe.

    Unknown language, encrypted – Impossible to rule out, but seems unlikely. If you can’t read the plaintext, you can’t exactly know when you’ve arrived at the plaintext either. Still, if Voynichese is an unknown natural language that’s been encrypted, you’d expect it to share some formatting features with other encrypted messages, like efforts to disguise words. Of course, the “words” in the Voynich Manuscript might be a decoy with spacing arbitrary or designed to throw crackers off, so this is really difficult to evaluate.

    Artificial Language – It’s already a given that Voynichese represents a unique alphabet, albeit one where many of the characters are familiar; it’s not too far of a stretch to say it could be an artificial, rather than a natural, language that uses an artificial alphabet – like Esperanto if it had its own unique character set. This could potentially explain some of the weird features of Voynichese, like the repetitious patterns of the language. However, like the unknown language, unenciphered bit above, this isn’t much help.

    Hoax – Last and not least, the possibility exists that the Voynich Manuscript contains no information content, and was written as an elaborate (and expensive) hoax, by persons unknown for reasons unknown. Again, hard to prove, but the regular structure of Voynichese suggests against it, or at least that if it is a hoax, it’s a sufficiently clever hoax that the person who did it came up with some kind of quasi-stochastic process that put out a text that looks strongly like something information-bearing.

    Without ruling anything out, I think natural language, encrypted has the strongest case, if only because of the similarities of Voynichese to characters in some medieval European ciphers, and the circumstantial evidence (radiocarbon dating of the vellum, the Zodiac illustrations) suggesting the same general period and era.

  2. A song book would not be flat and would be difficult to decipher.

  3. xplor: words and music, or just music?

  4. Bobby D.: if you accept that most of the supra-orthographic features are present in the Voynich Manuscript, how can you comfortably reconcile the kind of manipulated page layout their simultaneous presence implies with anything apart from an encrypted text?

    I’m reasonably sure that’s the reasoning Elizebeth Friedman had in mind back in 1962, and I don’t see any obvious way around it in 2016. 🙂

  5. Matthew Opitz on January 21, 2016 at 9:48 pm said:

    Interesting thoughts, Nick! But there was one thing that I was sure you would mention that you skipped over when talking about “supra-orthographic” structures: the regularities found by Torsten Timm in this paper “How the Voynich Manuscript Was Created.”

    Namely, Torsten Timm put forward three main observations that I find very convincing. First, almost all of the words of the Voynich are a very small “edit distance” away from three core words: dain, ol, and chedy. Secondly, words within a line and words in adjacent lines are more likely to have smaller edit distances. Finally, there are a small number of predictable rules for “allowed” (observed) variations stemming from the three core words.

    Then there’s Brian Cham’s “Curve/Line System.”

    Then there’s Jorge Stolfi’s “Crust/Mantle” paradigm.

    What do I make of all this?

    I agree with you that the Voynich Manuscript cannot possibly be a monoalphabetic single-substitution cipher of a natural language. No natural language behaves like this. The Voynich Manuscript is too highly ordered, and there are too many possible combinations of words that DON’T appear simply because the edit-distance rules prohibit it. For example, one could write “denminka” (EVA transcription) in the Voynich alphabet. The Voynich alphabet has all the letters for it. But you will never find anything like that in the Voynich manuscript because it happens to be too many “edit-distances” away from the three core words. Why only three core words?

    One possibility I have entertained is that the Voynich Manuscript could be encoding music. That’s the only thing I could think of that would fit with the three-core-word paradigm and the highly-orderedness of the text, such that words within lines and lines tend to repeat themselves with very slight variations in between. I know that others have surmised this as well. Perhaps the three core words denote something like certain octaves of C, and the edit-distances away from the core words denote pitch distance from that octave of C, rhythm for that note (eighth, quarter, or half, etc.), and/or other special things for each note? This is the only possibility that I can think of that ISN’T a hoax that would explain the repetitiveness of the text within lines and between lines.

    Otherwise, the only other explanation I can arrive at is that the Voynich Manuscript might be a hoax. You say that the “flatness” of the text goes against this, but Torsten Timm points out in his paper how it would be very easy to produce text that is seemingly random and nonsensical but also ordered, just by starting with three core words and doing many different variations on them following certain edit-distance rules.

    Right now, I lean towards hoax. Still split between “ancient hoax” (c. 1400) of nonsense text, vs. “modern hoax” (c. 1910) of nonsense text using old vellum. Rich SantaColoma’s work is something I eagerly anticipate this coming year.

    And I would find your take on things in this new year of 2016 fascinating if you would perhaps see it worth your while to engage with the Torsten Timm paper in particular (and perhaps Brian Cham’s work linked above) and, if you disagree with their arguments, explain a little more fully what you find lacking in their arguments. Then I and a few others might be persuaded to back off from the hoax idea.

  6. >> “…there is also a long-unresolved suggestion that Q20 may have originally been composed in two parts / phases, and also not coincident with the final binding order”

    Very interesting, is it expressed anywhere in more detail? And does this suggestion take into attention the circumstance that Q20 is all in Currier B?

    Btw, I’m a bit confused about Currier B in Q20. classifies it all as Currier B, however Currier himself considered it ” ‘modified B’ (i.e., containing certain ‘A’ characteristics)”

    So is it B, or “rather B”, or “both A and B”?

  7. Anton: A vs B is a bit of a sweeping generalization, a hat that doesn’t fit every head. Rene Zandbergen has been threatening to write up his take on the differences (particularly the ‘shaded’ pages where there are transitions), and that is something I’m looking forward to. 🙂

  8. Matthew Opitz: Timm, Cham and Stolfi’s opinions / hypotheses / models relate to purely (i.e. non-supra) orthographic questions, which is what I didn’t discuss them in this post.

    To me, it would only make sense to discuss them in great depth if we had worked out how to normalize Voynichese text first, otherwise the samples used to define and inform those models will be polluted by all the confounding effects of all the other (unremoved) stuff. That is, given that they had not removed all the bad stuff before making their analyses, should they still hold so tightly to their theories?

  9. Good news, looking forward to that too. If Q20 is mixed A and B, that would be very interesting.

  10. On the study referred to in this blog entry, I am quite certain that something went wrong in the analysis, more specifically when feeding in the transcribed text, and the result and conclusions are not usable as they are. In one of the discussion blogs, the author identifies the pages where the words in the two clouds come from. One cloud has all pages in the MS. The other cloud has all pages starting at folio 67 verso and ff.

    I would hope that the author will look into this, and I am still interested to see a corrected version.

    In fact, most of what we know about the text comes from statistical analysis of all types. Where it is going wrong is when one specific analysis is used to draw conclusions. This has to be done by taking into account all of them (in the appropriate manner of course, meaning that not all of them will play an equal role – for various reasons).

    On Q20, I did not look at that for many, many years. What happens there is best described as some kind of evolution. There is no specific evidence that the text is not in the right order. Also, all text seems B-like, or is firmly Currier B if one defines that by the occurrence of Eva ‘ed’.

    What all analyses show is that the text was not generated by a simple (automated) process but by a human. But that’s not exactly a break-through 🙂 Things vary in many different ways. The conclusion of ‘meaningful’ from the Montemurro and Zanette paper is not yet justified for me. I’d describe it as human and/or evolving. The main problem not addressed there is that two areas with basically the same illustrations have just about the most different text properties.

  11. Apropos of nothing, the new discussion site is creating a peer-review system for Voynich related papers.
    Authors will be able to submit their drafts in advance of publication and receive impartial peer-reviews from fellow researchers on the content of their paper.

  12. Rene: …though you’d admit that f105r looks much more likely to be the first page of a quire than f103r?

    …and if that is the case, there’s a reasonably good chance that f105r was the first page of Q20A and that f114v f116v was the last page of Q20B? 🙂
    Lots on Q20 here from 2010: – including my suggestion (which I’d forgotten) that the tail-less paragraph stars on f103r might well be fake paragraph stars. 🙂

  13. Rene: all of which, looking at it again afresh several years later, now suggests something quite different to me, i.e. that
    * the stars on f103r/f103v look to be fake paragraph stars, added to make the first page visually resemble the rest of the quire
    * f103r/f103v therefore probably contains free-standing text unconnected with the remainder of Q20 (pay no attention to the stars!)
    * f105r marks the start of the real Q20 contents, while f114r f116r was the end of the real Q20 contents
    * therefore f105/f112 f105/f114 should probably have been the second nested bifolio in the quire/gathering (not the third)

    In which case, my earlier suggestion that Q20 was instead formed codicologically of two sections (“Q20A” and “Q20B”) doesn’t hold water, though I would still argue that it seems likely that there is instead a content division – i.e. between f103r/v (which I suspect contains the end of a preceding section of text, perhaps from Q8?) and the rest of Q20 – which was masked by a set of fake stars.

    And if that is the case, the number of the remainder of the stars may well then add up to 360 or 365, but I’ll need to do the maths again… 🙂

    Hope that makes sense!

  14. Rene: also, I think it’s perhaps a little unfair to single out Montemurro and Zanette for not having any explanation as to why Herbal A and Herbal B are so different and yet both appear to be herbal. There’s only one Voynich theorist who has even attempted to construct such an explanation, and that has elicited far more rotten tomatoes from the crowd than gold coins… 😐

  15. Rene: I think I ought to put my updated Q20 observations into a new blog post – for example, that it might be that we should only be counting paragraph stars with a tail (i.e. apparently hiding ‘y’ for ‘ytem’ / ‘ybidem’) when trying to reach the magic 359 / 360 / 365 figure.

  16. Hi Nick,

    what I wrote was based entirely on some text properties and statistics.
    One could argue back and forth on a number of your points, of course. While they seem sensible, we don’t really know. That f116v is ‘the end’ is the most probable of all assumptions, but that would make f103r still the start of ‘a quire’.
    If some ‘paragraphs’ should be ignored, we should be able to see something different in the text, I believe….

  17. Rene: I now think f103r is indeed the start of the Q20 quire, but that f105r is the start of the book that occupies nearly all of that quire (which could well be MS. 6741). I’ll write this up over the next few days, see where all the numbers end up.

    MS 6741 is listed in Gallica – – (hope the link works!)

  18. f105r is highly interesting because of the break after the third paragraph (different ink, different line inclination, somewhat different (?) writing size).
    Then there’s also the thing that the third paragraph seems to end with an incomplete word, and some words have been added above its first line…
    Looks almost as if the top was added after the bottom was written (not a very serious proposition though).
    If any page has ‘fake’ stars, this would be f111r (first two thirds). On f103r I do see short paragraphs as usual, and one star for each of them.

  19. A couple of weeks ago I made a post at VN about some rough stats of the Recipe section: the (estimated) number of “cycles” in the Recipe section is close to the (estimated) number of formatted pages in the rest of the VMS (the “formatted” pages being called those which have text organized in paragraphs).

    Other considerations currently do not allow to make much of it (David made an interesting suggestion of a florilegium, though), but the “365” or “360” hypothesis does not look to me plausible neither.

  20. Rene:

    >> “The main problem not addressed there is that two areas with basically the same illustrations have just about the most different text properties.”

    Actually there is the question about what is the definition and the extent of “difference”. If the same words consistently occur both in Currier A and Currier B, then are A and B “different”, or they are not?

    So in addition to formalizing the differencies between A and B, it would be most helpful to formalize their intersection as well.

  21. Anton: I have long said that I think there is much that has yet to be gained from examining how the ‘language’ evolved from A to B (or was it from B to A? Who knows?)

  22. Dear Nick,

    what you define as bad stuff is still part of the VMS. In fact, my ‘grid’ covers words occurring at least four times. (Note: For words occurring less then four times transcription errors get more important.) Additional I track words occurring seven or eight times within the VMS. I didn’t see any reason to distinguish between bad and good occurrences for them. Moreover, if I would limit the analysis to a particular section of the VMS, it could be argued that the analysis only covers a part of the manuscript. I don’t think that it speaks against a theory if they covers the whole VMS.

    Best regards,

  23. Torsten: just because any given theory uses as its input data both ‘good’ (well-formed) stuff and ‘bad’ (problematically-formed) stuff doesn’t necessarily make that theory super-good. Rather, it makes it super-tolerant, which is quite a different thing to ‘insightful’ or ‘useful’.

  24. bdid1dr on January 22, 2016 at 5:06 pm said:

    Gentlemen: Have you considered, in the past, my contributions in re the ‘bulleted’ (starred) (asterisked) paragraphs being (in sequence) reference to the ‘recipes’ folios and pharmaceutical jars? And, perhaps based on a ‘daily calendar’. It might also make sense that scribes and artists were co-operating (working in tandem). Whether the ‘calendar’ was European or Azteca would be good lecture material. Take another look at the ‘pharma’ jars and consider the proportions of red and blue (hot or cold) in each jar.
    In the not too distant past, I offered this same discussion to Nick; right up to the last folios (102 and 103 ? ). Perhaps some confusion could be resolved is that the handwriting could be described as “Scribe One” and “Scribe Two” . Artists’ works could also be portrayed as Artist 1 and Artist 2……..

    Still beady-eyed, but still smiling.

  25. Nick, indeed, it is not necessary. But this doesn’t mean that a theory covering the whole manuscript cant be ‘insightful’ or ‘useful. Nevertheless, a particular word isn’t sometimes ‘good’ and sometimes ‘bad’ . It’s still the same sequence of glyphs no matter in which part of the manuscript this sequence is used.

  26. Like lead sheets today, 15th century song books would only be an aid to the troubadour .

  27. Torsten: even when it is the first word of a page? Of a paragraph? Of a line? Even when it is the last word of a line? Even when it is placed between two single-leg gallows on the top line of a paragraph?

    Perhaps you’re right… but perhaps you’re not. Might it be that your algorithm ended up having to be so tolerant of transitions simply because so much of its raw input was noisy?

  28. Nick: Indeed, if it comes to word frequencies I count all instances of a particular word. For instance ‘fshedy’ occurs twice. Both instances occur within the first lines of two paragraphs. Once it is used as the first word and once as the second word. Therefore I count two instances for ‘fshedy’.
    Maybe the noise is just not strong enough to hide the way the Voynich MS was created. The whole Voynich MS is there. Someone wrote it using some method.

  29. bdid1dr on January 23, 2016 at 12:01 am said:

    Nick and Friends:

    I am sad to see that you are still firmly stuck with decoding the EVA . There is no code in Boenicke manuscript 408. The EVA is, in itself, misleading as well diverting from the actual readable/translated script which, for the most part, is right on-track — and easily translated.
    Some twenty-five folios will soon be on their way to a local book printer. I don’t think I have to sign in with the ISBN folks, much less the Boenicke Library (Yale?) .

    Depending on the price the print shop is asking, I may go all the way through to folio 116v – but only to Busbecq’s sign-off at Ankara Turkey, and his referral to Ancyranum Augustus.
    I’m still looking for ‘nihil obstat’; which would be very important, as far as the Spanish Inquisitors apparently not finding any heretical issues with Fray Sahagun’s ‘rough draft’ manuscript (which was sent to the Spanish Inquisition just a short time before Suleiman’s army overwhelmed a large area of Europe) .

  30. Julian Bunn on January 23, 2016 at 9:59 am said:

    [Message from Julian Bunn:]

    Hi Nick. Your points are well taken, and this was an enjoyable post to read, as were the replies from others so far. When I last faffed around with the text, it was the “Recipes” section (Quire 20) I concentrated on, as that is dense text with relatively little of the weirdness of the other sections’ text, and all in the same Currier Hand.

    In any case, the statistical analysis of any of the text becomes tediously unrewarding after a while. Been there, done that! The most refreshing idea is that the “text” is a transcription of music, in some way, as Matthew Opitz suggests above, and which has been explored by others before.

    It would be interesting to hear your opinion on the music idea.

  31. bdid1dr on January 23, 2016 at 5:11 pm said:

    ps: It is very hard for those persons who are housebound (and depend on folio numbers rather than quires) to contribute to the discussion for any item in B-408. Has Boenicke finally begun showing ‘groups of folios’ as being numbered quires? If not, I’ll stick to my translations of individual folios (which each has a number).
    So, B-408 was separated from a large bundle of some 200 manuscripts brought back to Europe by Busbecq — so he could prove the provenance of recovery. Somewhere there should be other scrolls/manuscripts with with signatures of various Kings/Queens/Royalty/Libraries documenting receipts of some 199 other manuscripts: Rudolph, Frederick, Francis, Phillip, Ferdinand, Isabella……..

    Have any of you taken a look at the the document which sent Columbus to sail the seas and claim those lands ( China) for Portugal and Spain? Mostly, the interest in claiming China was for the sericine product known as silk.

  32. Gert Brantner on January 23, 2016 at 6:41 pm said:

    I can’t help it, but all the rearranging, trimming and addition of stuff like fake paragraphs (maybe even drawings) to me strongly smells of “modern editing” to give the mess of vellums a codical appearance as it was understood back around ~1908. To me it seems someone put things “in order” in his own mind.
    I’m still waiting for the day the binding will be investigated – for dating and maybe even localization of the ingredients (thread &&|| glue).
    It could even present a possibility to reconcile “15th C. origin” vs. “modern forgery” theories… *duck

  33. Oh BD,

    I’m in a good mood today, so here’s some help.

    First, you can find info about quires over at Rene’s site, there’s even some nice little diagrams that show how the various folios are arranged within each quire.
    I am not allowed to post links here but I’m sure Google can point you in the right direction if, in all the years you’ve been replying to Rene, you still haven’t even bothered to check out his website. Hint: the address isthree w’s followed by Wilfrid’s last name followed by a dot followed by nu.

    Also: the Yale University rare book & manuscript library’s name is spelled Beinecke not “Boenicke”, as you’ve been consistently typing it here.
    Boenicke is the name of a Swiss loudspeaker company: if you’ve been waiting for Boenicke to show VMS quire number info on their website, I’m sure it must have been a long, frustrating wait…

  34. Gert Brantner: my understanding (I hope sure Rene will correct me if this is wrong) that the Voynich Manuscript’s binding has now been investigated, but the findings and analysis have not yet been completed / published.

    I’m not sure there are any fake paragraphs (as opposed to fake paragraph “stars”), though. The paragraph stars seem to have been added to make some free-standing ‘y’ (‘ytem’) shapes invisible, ninja-style: the fake paragraph stars seem to have been added to fill out the rest of quire 20. Which then leaves… 🙂

    As for any talk of modern forgery, you’d better make sure you stay ducked low. 😉

  35. bdid1dr on January 23, 2016 at 11:28 pm said:

    Nick, I’ve just translated “Panis Angelicus” (as sung by Andreas Botticelli – the extraordinary Opera singer)
    Since the words were being sung in Latin, I was able to apply the so-called “Voynich” Manuscript’s alphabet/phraseology into:

    P-n-s a-n-ch l-l e-ceus

    Phit Pa-n-s om-neus

    P-a-tl P-a-tl (or Pa-tr Pa-tr)

    Phit Pa-n-s om-i-neum

    The large curlicued “P” s can stand alone as a combination of a words like Phil-a-del-ph-ia or Phoe-nix
    Pharmacy is represented, here in the US by the Rexall Drug Store. You can probably find the logo online.
    I shall now try out my own reference/advice.
    Beady (who is down to reading with only one eye, now. Excuse the typos please?

  36. Just about the only supra-word-level orthographic structure we can directly match is the change in frequency stats for the last letter of a line. In natural languages, we often see a hyphen placed there, while in Voynichese we often see EVA ‘m’ or ‘am’: so I would be unsurprised if these are essentially the same thing.

    The pilcrow is also an example of supra-word-level orthographic structure in a natural language. For example, in some German manuscripts there is a frequently occurring line-initial character, e.g.:

    I’m not sure whether that is the pilcrow but it is an example of how the frequency for line-initial characters can vary from that of the main text in a natural language. What’s your opinion?

    On the other hand, in the Voynich the change in frequency for line-initial characters is not restricted to one glyph. There is one way to see this clearly. In the main Voynich text, about 18% of words begin with EVA ‘c’, they’re fairly common. However, few lines begin with ‘c’.

    In some folios this is really apparent, for example:

    And several others. The distribution of c-initial words does not seem to correspond to naturally flowing/wrapping text. Why?

  37. The page with the folio layout per quire in the MS is this one:

    Indeed, as Nick says, the binding of the MS has been looked at very closely by several MS conservators at the same time. Interestingly 😉 when it comes to details they couldn’t agree. However, the stitching is certainly very old, and they could read a large part of the history of the MS just from this. I don’t know which part of this will be published in the Yale book of essays, to appear in 2016/2017.
    The repairs made by Kraus (50 years ago now, and 50 years after Voynich bought it) are largely hiding the old stitching in the images of the binding, but one can see (for example) how this old stitching made its impression over time on the dorso of the newer cover. Traces of the old cover (tanning of the leather) are visible on the first and last folios.
    What’s also important is, that any missing bifolios would have been removed before the present (old) stitching. The book would have lost its integrity if this were done afterwards. I was also told that it would be far from easy to just cut out pages (shades of Mr.Bean…), even though it doesn’t seem that hard to an amateur like myself.

  38. bdid1dr on January 24, 2016 at 5:28 pm said:

    Job and Rene: Thank you for the lovely layout of each folio of B-408. I will now be able to enlarge whichever folio and quire is being discussed (and maybe put away my magnifying glass).
    Rene: Several times in the last three years I’ve tried to post an item on your blog page. So far, what works for me, is Nick’s presentations and discussion with you on Nick’s own pages.
    Job: As researchers go, you are a hard one to beat ! This remark is a COMPIMENT to your thorough and graceful presentations which always come first ‘under the wire’ so to speak!
    Is today Nick’s ‘away-day’ with fellow cryptologists?

  39. bdid1dr on January 24, 2016 at 5:50 pm said:

    Goose: Thanks for correcting my spelling of that Library at Yale. Probably I’ve been mis-spelling the name somewhat unconsciously because I don’t like Ms. Zyatz’s way of making obscure presentations which are usually presented to only a small selection of persons/experts. She rarely follows up with a discussion of her findings or those of the persons present at her ‘talks’.
    So, why bother?
    PS: The Folger Exhibit was another non-forthcoming waste of web-viewers time
    because it was all about either Roger Bacon, Shakespeare, or the “Voynich” (Mr. and Mrs. Voynich ‘expert’ Codiologists. Nothing at all about the contents of what we now call B-408.

  40. bdid1dr on January 24, 2016 at 6:00 pm said:

    BTW: I claim responsibility for referencing the so-called “Voynich” manuscript as Beinecke manuscript 408. My motive? Because every time I see reference to
    “VMS”, I think ‘multitude of vermin’ . Terrible, I know!
    Thanx, Guys!

    beady-eyed wonder
    sometimes ‘wonder-er’

  41. Gert Brantner on January 24, 2016 at 9:10 pm said:

    Nick, Rene,

    Terrific news, can’t wait to learn where the specialists disagree 😉

    Nick, I’m only a bystander, but I’m always looking for a nearby trench, should the corssfire start.

    Does humour belong into voynich research?

  42. Gert Brantner on January 24, 2016 at 9:32 pm said:

    @Matthew Opitz

    The musical notation idea is interesting to me, because it seems to be one of that bits that should be more easy to (dis)prove. You mention other peers to have surmised the idea, can you name some sources? My ggl searches seem to get overlaid by the voynich opera. I have some ideas towards this and would like to find out if it’s worthwhile to follow them.

  43. Matthew Opitz on January 25, 2016 at 4:45 am said:

    Sorry Gert, I can’t recall off the top of my head where else I read that speculation about the Voynich encoding music.

  44. Matthew Opitz: there have been plenty of musical speculations about the Voynich Manuscript over the years (if not indeed decades). Were you perhaps half-remembering one of the following:

    * (Julian Bunn having some funn 🙂 ).
    * (only seems to work in Internet Explorer)


  45. Gert: humour, like common sense and affordable good wine, seems to be in short supply everywhere these days, not just in Voynich research. 😐

  46. bdid1dr on January 26, 2016 at 7:37 pm said:

    Not gonna go to any of those green-ink items, much less intnetexpl…..

    Howsomever —Check out Fray Sahagun’s “Psalmodia” (Latin/Nahuatl).

    ’twas not a hot item as far as his Azteca parishioners were concerned. Apparently they were grouped separately on each side of the aisle. I don’t know how baptisms were arranged. One x-ample was the use of the ‘X’ for any word or name which had the syllable ‘christ’. Not too long ago, I referred to the letter which Ferdinand and Isabella gave to X topher Columb(us) giving him ‘master of all the seas…..’

  47. Uh-oh, here’s another comment by the (maybe) lunatic fringe.

    I have read some about the characteristics of page initial, paragraph initial, and line initial words that start with a limited range of glyphs,

    I have thought of a possible explanation.

    Unfortunately, it only seems to work with my proposed solution. (That figures, huh?)

    In my proposed decodings of the pages using my ideas, the order of the various ingredients (or whatever they are) probably wouldn’t much matter in each potion/prescription. The order in which they are listed/added might be immaterial, for the most part. They could be arranged to suit the whims, needs or rules of the author.

    Along these lines, the possible separations between potions/prescriptions in a paragraph seem to be shown by repeated uses of an herb. This dividing point is almost surely instituted with a conscious effort of the author to order the list for each potion/prescription accordingly. This is reflected in some instances by the appearance of repeated consecutive words within a paragraph – in other instances, the repetition of an herb name’s use is less obvious. (According to this line of logic, three repeated consecutive occurrences of a word or herb would probably mean that the middle occurrence was a one-herb potion/prescription and the other two occurrences were the end ingredient in the potion/prescription preceding it and the first ingredient in the next one. Hey, it’s an explanation…and at least as good as any other I’ve seen for words repeated three times in the VMS.)

    I can find few recipes in other sources for potions/prescriptions which give two formulations for the same herb – it is usually only one formulation of each different herb. That’s why I believe that my system for dividing up paragraphs may be correct. The doubled and tripled words in the VMS are what put me on the track – they needed explaining.

    Although the word order would have to be arranged specifically to allow for these recipe divisions to be easily found, this word order within each recipe might be further arranged to give words starting with certain glyphs priority of place within the page/paragraph or line.

    I don’t know the state of the art of decoding mysterious documents in 1421, but possibly the author may have felt that arranging the words in the ways noted helped to keep her/his secret.

    For those of you who don’t believe my Voynich Lite ideas show anything important and are still looking for a way to read the Voynich words like language, my ideas may seem a bit strange. What? No sentence structure? Nope – no sentences as such. For those who think my ideas may have some merit, this freedom of arrangement should be semi-understandable, at least.

    My other contention is that the first recipe on a page or maybe in a paragraph is likely to be one for the richer patients and containing the more expensive ingredients. These more expensive ingredients might be the imported ones.

    The EVA = p glyph has been identified by me as the z sounding glyph. Most of the herbs starting with z on my list are imported ones – z wasn’t used as a letter starting many native herb common names in England in the 15th century. Finding herb names starting with a z in the first line(s) of a page or paragraph (and mostly only there) would thus be expected.

    Having them start a page/paragraph might be:

    1. A possible way to give a page heading (not necessarily an image identifying one)would be by identifying an easily understood herb (and its main usage in medicine), since most foreign/imported herbs seem to have had only one main medicinal usage, while many native herbs had several to many.

    2. A confounding way for the author to keep outsiders from understanding why the words are written in the order they are written – one of many ways the author seems to have tried to keep secrets.

    3. Both of the above, since they could both be used without affecting potion/prescription or ingredient clarity.

    Line order could likewise arranged at the whim, in answer to the needs or according to rules (as yet undiscovered by me) of the author, who seems to have shown a preference for some glyphs over others for starting lines, as well as pages and paragraphs. Again, for the most part, order would seemingly not affect clarity – taking into account there might be more than one recipe on a line or a recipe may occupy more than one line, and also depending on the repeating herb name codes used for recipe separation.

    This reasoning might also be applied to page ending, paragraph ending or line ending words or glyphs.

    The order of the Voynich words may have been almost entirely in accordance with the whims, needs and rules of the author and the constraints imposed by the ingredient contents of each potion/prescription, not the rules of any language.

    Thank you.

    Don of Tallahassee

  48. My ideas seem to show paragraph construction in the VMS first lists the ingredients in the lead potion/prescription in the paragraph. The second and later potions/prescriptions will follow with the first word in the next potion/prescription having the same Group I/Table I herb code as shown in one of the words in the previous potion/preparation.

    I think paragraph and page end breaks are also potion/prescription breaks. This may be a wrong assumption.

    This system does away with needing to have any punctuation marks separating the different potions/preparations in the same paragraph. No periods or commas needed at all.

    As explained in an earlier email, the double and triple consecutive occurrences of words in the VMS are also explained as two of the easy opportunities to spot where potions start and end.

    The first repetition of a word in a double appearance of a VMS word would be the last ingredient in the earlier potion/prescription. The second appearance would be the first ingredient in the next potion/prescription.

    Triple occurrences of a word would be explained as the first occurrence being the last ingredient of a potion/prescription – the second being a one-word, single ingredient potion (simple) and the third occurrence being the start of another (third) potion/prescription.

    For examples for those readers still unsure of how this works, I think I have tried to do this in the decoding attempts at my site, especially the herbal pages.

    The double/triple word thing works even if the ingredient measurement/part used is different (words not necessarily all identical) – as long as the Group I/Table I code at the left end of the word remains the same in two or three consecutive VMS words.

    Thank you.

    Don of Tallahassee

  49. bdid1dr on January 30, 2016 at 6:59 pm said:

    @Tallahassee Don: You’re getting there! Here’s a recipe for you:

    orange pekoe orange pekoe orange pekoe
    cup aqua cup aqua cup aqua hot
    azucar azucar azucar spoon

    Hence the repetitious recipe for three cups hot of orange pekoe tea: ‘tl’ and reasonable amount of sugar for each cup of tea. You won’t find this particular recipe in the Very Mysterious Manuscript; but I am trying to prove that we really need to get past the EVA which does not aid in “Decoding OR deciphering any block of discussion or illustrations which are found in B-408 manuscript.
    You have come very close to translating the text which appears in B-408. Compare the text of B-408 with Fray Sahagun’s Spanish lectures which are published in his “Psalmodia” and his huge manuscript which, over centuries, got passed around to several countries and ended up in in the Library in Florence (as the Florentine Codex) (and later in France? Or vice versa? )

    In more recent years (twentieth century) one of the Popes John returned several hundred manuscripts to Mexico. A good source of info in re the returned mss is Miguel Leon Portilla. Check him out!

  50. bdid1dr on January 30, 2016 at 9:43 pm said:

    ps: On this same discussion page, I referred, incorrectly to that very famous opera ‘star’ : His last name is Bocelli.

    Another ps: In re manuscript B-408 — the scribes, artists, coloring material, and paper-makers are all portrayed and translated in the “Florentine Codex”. There is an online edition of the Florentine Codex — which has been translated for every pictorial item — right to the end where we ‘observers/readers’ can then begin to translate the so-called “Voynich” manuscript .

  51. bdid1dr on February 8, 2016 at 6:25 pm said:

    The most bothersome thing, to me anyway, was that Anderson and Dibble translated every single word of every single manuscript into English (instead of the Latin). Anyone, today, would more easily read the contents of B-408 if today’s researchers had a Latin-Nahuatl dictionary at hand. (Which I do with every item (botanical or otherwise).

    Nick has shown no perceptible interest in my offerings. So follow the leaders down the same ‘dead end’ road of the EVA. Yes, it was very apparent to me that B-408 passed through several ownerships; each of which were unable to ‘decode’ its contents.
    However, it did not divert me (with my one reading eye) from comparing and translating the contents of B-408 with the entire contents of Fray Sahagun’s Florentine Codex and Fray Sahagun’s “Psalmodia”…. PS-ll-m tll-a .

  52. bdid1dr on February 9, 2016 at 4:58 pm said:

    At least one botanical item (displayed) is the ‘monks-hood’ plant; which is not only poisonous, but also inimical to other floral specimens in gardens. I’m sure you can locate in B-408 which plant I am describing and discussing.

    Apparently nobody has shown any interest in my describing the blossom of the squash plant (cucurbit) which appears in B-408 : To my eyes, it appears as a ‘propellor’ because all you see is the blossom which seems to be partially bisected. The accompanying script is about ‘oshquash’ and seeds.
    I now can’t remember if I ever posted this discussion on Nick’s pages (in addition to my discussing this item on Ellie Velinska’s blog).
    My apologies, Nick!

  53. bdid1dr on February 11, 2016 at 7:00 pm said:

    Psyllium seeds: several months (years?) ago, I referred you to the folio which had a very convoluted introductory word. The dialogue which accompanied that elaborate folio was about using the plant’s seeds/hulls for thickening gravy or making a gel. Most often the seed hulls were soaked in water until they were added to gravies, soups, puddings. The main use for the hulls were as a digestive aid and laxative.

  54. bdid1dr on February 14, 2016 at 6:27 pm said:

    And then there is the botanical specimen which ‘everybody’ insists is a sunflower:
    Scabiosa — which was used as a lotion/soap to treat scabies for humans and/or their dogs/pets.
    I’ve mentioned this particular item on NIck’s other puzzling pages in re B-408 — and cite my source: The New Illustrated Encyclopedia of Gardening (Book 11–Root-sen)

  55. bdid1dr on February 14, 2016 at 6:29 pm said:

    Have a heart! Celebrating St. Valentine’s Day !

  56. bdid1dr on February 20, 2016 at 9:56 pm said:

    @ Rene: What are the ‘clouds’ you have discussed a week or so ago? If you are referring to that multi-fold folio which has what looks like rain clouds in every corner, what IS being portrayed are large edible mushrooms. One of those edible mushrooms is a poisonous look-alike. Its poison, besides creating hallicunations — causes a lingering, painful death. The mushroom is called the “alcohol inky” — because it is any alcoholic drink which activates the mushroom’s poisonous nature.
    I have posted this several times in the past three years. I wonder how many wild-mushroom fanciers have died — not knowing the cause of their death.
    I hunt wild mushrooms only for the dye they produce for dye-ing my hand-raised sheeps wool, and my angora rabbit hair. I then spin the wool and hair into yarn. I then knit the yarn into a dolman-sleeved zippered sweater. Altogether two year’s work.
    I translated this very large folio, word-for-word at least twice, over the past two years. I was not, then, (or now) able to access Rene’s beautiful website — probably because he is the target of more hackers than anywhere else on the WWW.

  57. Hi Nick,
    I am trying to find a transcription file on the web for the VMs and I am having a hard time. I found references that Glen Claston shared with the community the transcription he created , but I cannot find it anywhere. I found the file for the font needed to properly display his transcription but not the transcription itself.

    In any case, this is a fascinating subject.

  58. Gigi,

    EVA transcriptions
    Download the default selection until you know more about what you want.

    Glen Claston files, including

  59. Mark Knowles on July 11, 2017 at 10:00 am said:

    Nick: I sense there is the belief amongst some that the most effective approach of breaking the Voynich cipher is by statistical means without the need for recourse to the drawings and their interpretation or text identification. Whilst I can accept that it is theoretically possible for this approach to work I doubt that without some hook(s) such as at least a small number of text identifications or other historically based information this purely statistical approach will be very hard. Note, I am not attributing this opinion to you, but I sense that this might be a current of thought amongst some others such as possibly Rene.

    As someone with a deep level of expertise in historical ciphers what is your opinion?

  60. Mark: my opinion is that we have a huge body of knowledge (observational and analytical) about both the text and the drawings, and that the amount of extra insight or knowledge needed to tip the balance in any direction may be far less than the more pessimistic seem to think.

    At the same time, while a good crib would be a little bit helpful, a large (block-level) crib would be invaluable beyond measure, which is why I’ve tended to focus on the latter as a way of making significant progress. 🙂

  61. For a crib to work, one has to have a rough general idea how the cipher works. Statistics show us that we don’t really have any idea.
    People have tried with cribs for many decades without success. That they failed is a result of using invalid assumptions (how they supposed that the cipher works).
    Cribs may be the key to opening the door.
    Statistics should tell us first where the door is, or even if there is any door at all.

  62. Mark Knowles on July 12, 2017 at 8:13 am said:

    Rene and Nick: I would be interested in knowing what statistical research has been done into “Single Word Labels” as I believe sentence text is like to be at least significantly more complicated. I think with Single Word Labels there is much less scope for complexity and so I believe it is a much better place to start decoding the cipher. I think if we can crack the cipher for Single Word Labels then we will be able to crack the cipher for the more difficult Sentence text as there must be a lot of similarities.

    As an example of my concerns about Sentence text there could be a lot of null or meaningless text, like repeated words, in amongst the sentence text; this could have a significant effect on the statistics and so skewing them. I would imagine the Sentence text cipher is the same as the Single Word cipher, but with one or possibly 2 extra details for manipulating sentences.

    As I have said before Short Word Isolated Labels, preferably with a corresponding text identification, are of particular interest to me as I think they would be the least complex to work with and so the best place to start,

    Please, tell me what you think.

  63. Mark Knowles on July 12, 2017 at 8:57 am said:

    Rene: I think it extremely likely that the cipher will incorporate aspects of contemporary cipher techniques from the early 15th century. I agree with Nick when he says that the cipher did not emerge out of nothing.

    I am always weary when individuals say things like “People have tried with cribs for many decades without success.” If you want historical parallels I would say: “There are very many examples from history where things have been tried and tried again without success and people have concluded that they are impossible, but then someone pushes the approach that bit further and it succeeds.” An example being the airplane.

    I think a crib can work, but it needs to be a good and big enough crib and certainly one needs to be very careful about the assumptions one makes. Also of course any other information statistical or otherwise one can bring to bear could be of great value.

  64. I remain convinced that single word labels will provide the key. But even if you know what a label refers to, i.e. if you correctly understand the accompanying drawing, there are still so many problems even on the crib side of the equation. Which specific name for the plant was chosen? Which language is it in? Which regional variety? Which spelling variation?

    You need to have some idea about these things before you can move over to the other side: how to get from the Voynichese word to the plaintext word you woukd expect?

    But unfortunately the problems already start with correctly identifying the meaning of the drawing..

  65. Mark Knowles on July 13, 2017 at 5:01 am said:

    Koen I am glad you share my opinion, or I yours, vis a vis Single Word Labels being the key. Statistical analysis that pertains only to Single Word Labels would be valuable.

    As far as the crib goes you make excellent points.

    I have looked a lot at the 9 Rosette page, as anyone who looks at this blog knows. I have and am developing a list of Short Single Word Labels, short words I emphasise should be less complex to work with and probably easier for text identification, of course the accuracy of these depends on whether one regards my analysis as credible; which is a big if. However some of my identifications should not be that controversial.

    As far as the quality of people’s text identifications for the rest of the manuscript I cannot comment, though I like to think there are some good identifications there. Given my Northern Italian analysis I would think one of the languages used was Latin, assuming the author may have taken the using 2 languages advice which Cicco Simonetta has described,

    Of course these are a lot of, possibly doubtful, assumptions I make here and as you suggest getting to reliable text identifications has many obstacles.

  66. Mark Knowles on July 13, 2017 at 6:10 am said:

    By Short Word I mean words of say 3 or 4 characters; not more than 5 characters. If all important statistical results could be reproduced just for the Single Word Labels, ignoring sentence text, then that would really help I think.

  67. Mark: the problem with short cribs is that Voynichese doesn’t offer would-be code-breakers the supportive follow-on infrastructure to amplify their initial short crib into a full-on break. The word structure, the line structure, the sentence structure, they’re all basically missing in action.

    And this is why I’ve been trying to find parallel blocks (in similar books of secrets, but say from the 14th or very early 15th century) to match what we see on the page. The matches there wouldn’t be with a single word but with the grammar, because that is arguably where we’re really stuck. 😐

  68. Mark Knowles on July 14, 2017 at 7:39 pm said:

    Nick: I think you are right in that if you can find parallel blocks in other sources that would be invaluable in decipherment, so I am fully supportive of that endeavour. In the meantime in the absence of such a parallel block discovery I think. personally, Short Single Word analysis is one the better places to start. My attitude is simple that one should start with the simplest case of the cipher in action. IF one can crack the cipher in the simplest case then I have a very high degree of confidence that one can move from that case to cracking the cipher in the most complicated cases. Even if one does not use a crib I think applying the statistical analyses which have already fruitfully been applied, to this Short Single Word subset of the manuscript text might illuminate the structure of the cipher more than statistical analyses of the complete text of the manuscript.

    Fundamentally I think that obviously one approach, searching for parallel blocks, and another approach of working with Short Single Word Labels are not mutually exclusive. I think they can both be explored at the same time.

    If you can find a matching block that could be a real game changer and I would not be shocked if it led to a solution, so I wish you god speed in your quest. However until that is found, if such a reliable parallel block can be found, then I think the approach I have suggest has some merit.

    As an aside, as I may have mentioned, I think there is a small parallel block on the rosette page, namely the Europe, Africa and Asia circle in the very top right corner.

  69. Mark Knowles on July 17, 2017 at 11:51 am said:

    Nick: I apologise in advance for my invented terms, but that is the best I could come up with off the top of my head; I would be happy to hear alternative suggestions.

    Swvoynich = Single Word Voynich = Voynich text excluding everything except Single Words (and associated images)

    Swvoynichese = the language of the Swvoynich

    I wonder, is swvoyichese flat? And if not in what ways is it not flat?

    You say:

    “* Substantial difference in word structure within “labels” (short pieces of free-floating text, typically inside or beside drawn features)”. If it is not inconvenient a elaboration would be of interest.

    Whilst it will be the same language the subset of the Swvoyich of words of length of less than say 5 letters is, as I have said, also of interest.

Leave a Reply

Your email address will not be published. Required fields are marked *

Post navigation