As I wrote before, I think we have four foundational challenges to tackle before we can get ourselves into a position where we can understand Voynichese properly, regardless of what Voynichese actually is:

* Task #1: Transcribing Voynichese into a reliable raw transcription e.g. EVA qokeedy
* Task #2: Parsing the raw transcription to determine the fundamental units (its tokens) e.g. [qo][k][ee][dy]
* Task #3: Clustering the pages / folios into groups that behave differently e.g. Currier A vs Currier B
* Task #4: Normalizing the clusters i.e. understanding how to map text in one cluster onto text in another cluster

This post relates to Task #2, parsing Voynichese.

Parsing Voynichese

Many recent Voynichese researchers seem to have forgotten (or, rather, perhaps never even knew) that the point of the EVA transcription alphabet wasn’t to define the actual / only / perfect alphabet for Voynichese. Rather, it was designed to break the deadlock that had occurred: circa 1995, just about every Voynich researcher had a different idea about how Voynichese should be parsed.

Twenty years on, and we still haven’t got any consensus (let alone proof) about even a single one of the many parsing issues:
* Is EVA qo two characters or one?
* Is EVA ee two characters or one?
* Is EVA ii two characters or one?
* Is EVA iin three characters or two or one?
* Is EVA aiin four characters or three or two or one?
…and so forth.

And so the big point of EVA was to try to provide a parse-neutral stroke transcription that everyone could work on and agree on even if they happened to disagree about just everything else. (Which, as it happens, they tend to do.)

The Wrong Kind Of Success

What happened next was that as far as meeting the challenge of getting people to talk a common ‘research language’ together, EVA succeeded wildly. It even became the de facto standard when writing up papers on the subject: few technical Voynich Manuscript articles have been published since that don’t mention (for example) “daiin daiin” or “qotedy qotedy”.

However, the long-hoped-for debate about trying to settle the numerous parsing-related questions simply never happened, leaving Voynichese even more talked about than before but just as unresolved as ever. And so I think it is fair to say that EVA achieved quite the wrong kind of success.

By which I mean: the right kind of success would be where we could say anything definitive (however small) about the way that Voynichese works. And just about the smallest proof would be something tangible about what groups of letters constitute a functional token.

For example, it would be easy to assert that EVA ‘qo’ acts as a functional token, and that all the instances of (for example) ‘qa’ are very likely copying mistakes or transcription mistakes. (Admittedly, a good few o/a instances are ambiguous to the point that you just can’t reasonably decide based on the scans we have). To my eyes, this qo-is-a-token proposition seems extremely likely. But nobody has ever proved it: in fact, it almost seems that nobody has got round to trying to prove anything that ‘simple’ (or, rather, ‘simple-sounding’).

Proof And Puddings

What almost nobody seems to want to say is that it is extremely difficult to construct a really sound statistical argument for even something as basic as this. The old saying goes that “the proof of the pudding is in the eating” (though the word ‘proof’ here is actually a linguistic fossil, meaning ‘test’): but in statistics, the normal case is that most attempts at proof quickly make a right pudding out of it.

As a reasonably-sized community of often-vocal researchers, it is surely a sad admission that we haven’t yet put together a proper statistical testing framework for questions about parsing. Perhaps what we all need to do with Voynichese is to construct a template for statistical tests for testing basic – and when I say ‘basic’ I really do mean unbelievably basic – propositions. What would this look like?

For example: for the qo-is-a-token proposition, the null hypothesis could be that q and o are weakly dependent (and hence the differences are deliberate and not due to copying errors), while the alternative hypothesis could be that q and o are strongly dependent (and hence the differences are instead due to copying errors): but what is the p-value in this case? Incidentally:

* For A pages, the counts are: (qo 1063) (qk 14) (qe 7) (q 5) (qch 1) (qp 1) (qckh 1), i.e. 29/1092 = 2.66% non-qo cases.
* For B pages, the counts are: (qo 4049) (qe 55) (qckh 8) (qcth 8) (q 8) (qa 6) (qch 3) (qk 3) (qt 2) (qcph 2) (ql 1) (qp 1) (qf 1), i.e. 98/4147 = 2.36% non-qo cases.

But in order to calculate the p-value here, we would need to be able to estimate the Voynich Manuscript’s copying error rate…

Voynichese Copying Error Rate

In the past, I’ve estimated Voynichese error rates (whether in the original copying or in the transcription to EVA) at between 1% and 2% (i.e. a mistake every 50-100 glyphs). This was based on a number of different metrics, such as the qo-to-q[^o] ratio, the ain-to-oin ratio, the aiin-to-oiin ratio, the air-to-oir ratio, e.g.:

A pages:
* (aiin 1238) (oiin 110) i.e. 8.2% (I suspect that Takeshi Takahashi may have systematically over-reported these, but that’s a matter for another blog post).
* (ain 241) (oin 5) i.e. 2.0% error rate if o is incorrect there
* (air 114) (oir 3) i.e. 2.6% error rate

B pages:
* (aiin 2304) (oiin 69) i.e. 2.9% error rate
* (ain 1403) (oin 18) i.e. 1.2% error rate
* (air 376) (oir 6) i.e. 1.6% error rate

It’s a fact of life that ciphertexts get miscopied (even printed ciphers suffer from this, as Tony Gaffney has reported in the past), so it seems unlikely that the Voynich Manuscript’s text would have a copying error rate as low as 0.1% (i.e. a mistake every 1000 glyphs). At the same time, an error rate as high as 5% (i.e. every 20 glyphs) would arguably seem too high. But if the answer is somewhere in the middle, where is it? And is it different for Hand 1 and Hand 2 etc?

More generally, is there any better way for us to estimate Voynichese’s error rate? Why isn’t this something that researchers are actively debating? How can we make progress with this?

(Structure + Errors) or (Natural Variation)?

This is arguably the core of a big debate that nobody is (yet) having. Is it the case that (a) Voynichese is actually strongly structured but most of the deviations we see are copying and/or transcription errors, or that (b) Voynichese is weakly structured, with the bulk of the deviations arising from other, more natural and “language-like” processes? I think this cuts far deeper to the real issue than the typical is-it-a-language-or-a-cipher superficial bun-fight that normally passes for debate.

Incidentally, a big problem with entropy studies (and indeed with statistical studies in general) is that they tend to over-report the exceptions to the rule: for something like qo, it is easy to look at the instances of qa and conclude that these are ‘obviously’ strongly-meaningful alternatives to the linguistically-conventional qo. But from the strongly-structured point of view, they look well-nigh indistinguishable from copying errors. How can we test these two ideas?

Perhaps we might consider a statistical study that uses this kind of p-value analysis to assess the likeliest level of copying error? Or alternatively, we might consider whether linguistic hypotheses necessarily imply a lower practical bound for the error rate (and whether we can calculate this lower bound). Something to think about, anyway.

All in all, EVA has been a huge support for us all, but I do suspect that more recently it may have closed some people’s eyes to the difficulties both with the process of transcription and with the nature of a document that (there is very strong evidence indeed) was itself copied. Alfred Korzybski famously wrote, “A map is not the territory it represents”: similarly, we must not let possession of a transcription give us false confidence that we fully understand the processes by which the original shapes ended up on the page.

50 thoughts on “Voynichese Task #2: parsing Voynichese into tokens…

  1. Thank you for the clear four part explanation. Beginning with transliterations is often used in attempting to translated ancient texts. Parsing patterns hidden within transliterations differs with respect to the actual text. Each text must be studied separately.

    For example, Egyptian hieratic math texts fall into different classes. The 26 line EMLR is the simplest. The EMLR scaled rational number 1/p and 1/n sometimes multiple ways, such as 1/8 was scaled three times, twice with one LCM and once by two LCMs, as a beginning student was introduced to unit fraction math around 1900 BCE.

    The 51 member 1650 BCE RMP 2/n table concisely rational numbers by one LCM in a manner was exactly reported in the introduction to the Kahun Papyrus, an 1800 BCE text.

    Hieratic weights and measures texts that discussed grain volume topics included two LCMs when scaling to quotients and remainders. Quotients were scaled to 1/64 units and remainders were scaled to 1/320 units. Scholars beginning in 1906 transliterated to cubit units of a 1900 BCE text, the Akhmim Wooden Tablet, housed in the Cairo museum. By 1923 Peet scaled this five problem text to a 1/320 unit and suggested he understood quotients and remainders, when only remainders were fairly parsed. By 2002 Vymazalova showed that two part quotient and remainder answers, obtained by multiplying an unknown initial value by 1/3, 1/7, 1/10, 1/11 and 1/13 were multiplied by 3, 7, 10, 11 and 1/13 and returned the same value 64/64, a unity value, Incirrectly concluding that Peet’s 1923 1/320 paradigm was correct. Only in 2006 was it published that 64/64 was the initial and final term operated on in a manner that scholars had not correctly seen in a 100 year old decoding project.

    Improperly combining, and throwing the baby out with the bath water, all hieratic unit fraction texts, as many linguistic scholars practice to this day. Linguistic scholars incorrectly conclude that Egyptian division was based in single false position, a math concept modified by medieval scribes to visually solve roots of second degree equations.

    The actual division operation used by hieratc scribes from 2000 BCE to 1650 BCE was inverse to the multiplication operation, a number theory property hidden in the scribal shorthand notes, the same rule that modern math uses in modern base 10 arithmetic. As an aside, unit fraction arithmetic formally ended when rational numbers were encoded in 1585 AD by Stevins as approved by the Paris Academy, and simplified in ways that our school children memorize today, without being told Egyptians 4000 years had developed the number theory based rules for our four arithmetic operations, in a closely related base 10 number system.

    Conclusion. Extreme care must be taken when combining transliterated data bases that encide language and mathematical information.

    Best Regards,

    Milo Gardner
    Reading the past by allowing the ancient texts to speak for themselves.

  2. Dear Nick,

    the text of the Voynich manuscript didn’t fit your expectations for an enciphered text. No problem, you can explain it with systematic copying errors. Unfortunately the copying error rate varies from 1.2 up to 8.2 %? Again this is no problem, you can explain the 8.2 % with systematic transcription errors. But why the number of transcription errors for Currier A is much higher then for Currier B?

  3. Torsten: I’ll come back to the 8% figure in a separate post, it’s not as if I’m trying to run away from it. But there is obviously also an issue to be tackled about what are scribal errors (missed letters, reversed pairs, miscopied letters, etc) and what are transcription errors.

    My general point about statistical tests is that you can calculate whether any differences oppose your null hypothesis, but you need to do a certain amount of preparatory work first.

    As for the difference between A and B, the obvious explanations would be that they were written by different people (Currier’s Hand 1 and Hand 2), and that the different writing styles presented different transcription challenges. But what is perhaps more interesting is that we should be able to cross-reference different transcriptions to try to isolate at least some of the differences.

  4. Young Kim on June 3, 2017 at 11:28 pm said:

    People who visit this venue including myself have been seeing lots of debate going on between others over everything about Voynich Manuscript and
    also hearing many claims that the mystery of Voynich manuscript has been finally solved. But it is really unfortunate that we don’t actually find one who witnessed any evidence that supports those claims. So, here I have a proposition to make. Why don’t we set up a Voynich Manuscript Translation Challenge with a prize? Any individual or a group of people who teamed up together can submit their translation results with a deadline. If I am not wrong, Nick has once tried to crowdfund his own project before and I think he could set up a crowdfund to raise the prize money for the open challenge. I recommend Nick Pelling, Rene Zandbergen, and Job (sorry that I didn’t catch your name) to be the organizing members to start with. Both Nick and Rene have their reputations in this community and it seems to me that Job could organize the web-site that might be needed. One or a few selected paragraphs out of the whole manuscript selected by the organizers can be used by participants. They can use any font or transcription that they pleased to use, but they should make them available to others freely. If I may, I would like to add one thing that the approach and method the participants are taking should address Voynich text in its original form, not using in encoded form. For example, the word ‘8aiin’ should be used as in the original Voynichese letters in their presentation so that anyone who is not familiar with the manuscript can understand their explanation. Some may argue that who could possible have the authority to make the decision on the winner, but I think the translated outcome will speak itself out.

    What do you think?

  5. Nick: Another possibility is that the scribe was purposefully writing ‘oi’ instead of ‘ai’.

    See for instance page f8r. On this page it is possible to find at least four ‘s’-glyphs that where changed by an additional quill stroke from ‘e’ into ‘s’. See for instance the ‘s’ in ‘chsey’ in line f8r.P3.16. In my eyes the scribe was writing ‘cheey’ and has later changed the first ‘e’ into ‘s’. The word ‘cheey’ occurs 174 times and the word ‘chsey’ occurs only twice. This correction suggests that the scribe was purposefully writing ‘chsey’ instead of ‘cheey’. Maybe the scribe was also writing purposefully ‘oiin’ instead of ‘aiin’.

    Anyway, we can’t change the transcription to improve our statistics. We have to accept that the scribe wrote two times ‘chsey’ even if the word ‘cheey’ exists 174 times. In the same way we have to accept that the scribe sometimes wrote ‘oi’ even if ‘ai’ is twenty times more common then ‘oi’.

    BTW: The existence of ‘chsey’ beside of ‘cheey’ suggests that it is not possible to parse ‘cheey’ as [ch][ee][y]. In the same way the existence of ‘qokesdy’ beside of ‘qokeedy’ suggests that it is not possible to parse ‘qokeedy’ as [qo][k][ee][dy].

  6. Scribal errors and transcription errors both almost certainly exist.

    The transcription errors we can fix now, and yes, Takeshi did introduce a number of consistent ‘features’.

    As regards the scribal errors, we are stuck with them. We can at best guess about some possible cases, but we will only know if (ever) the text can be read.

  7. Rene: if Voynichese is strongly structured (and I don’t believe we can easily eliminate this possibility) then in many cases we can suggest likely corrections. For example, qtoy could be qoty, qdain could be qodain, oiin would be aiin, etc.

  8. Nick: We can for sure correct the transcription errors. For instance we can correct ‘schol sair’ into ‘schol saim’ in line f8r.T3.21. We can also mark glyphs hard to identify like ‘y’ in ‘chy taiin’ in line f8r.P1.5. But we can’t know what a scribal error is and what not.
    If we would replace all 335 instances of ‘oi’ with ‘ai’ we would not only use our own interpretation of the text we would also wipe out some information. Lets see what happens if we do it as an experiment. At first there are words like ‘qoiiin’. Would you replace ‘oiin’ with ‘aiin’ and therefore ‘qo’ into ‘qa’ in this cases? Secondly, beside the most frequent word ‘daiin’ also the word ‘saiin’ exists. The word ‘daiin’ exists 863 times and the word ‘saiin’ occurs 144 times. How will you handle the suggestion to correct all instances of ‘saiin’ into ‘daiin’?

    BTW: ‘qtoy’ and ‘qdain’ doesn’t exist in the Voynich manuscript.

  9. Torsten: the issue of chsey is very interesting indeed! It may be that what we are glimpsing here instead is not so much scribal error as one of the evolutionary steps in the formation of the writing system.

  10. Young Kim: I think that EVA has given a sense of false confidence to a whole generation of Voynich researchers. We’re not even remotely close to the point where this kind of competition would be anything more than trollbait, sorry. 🙁

  11. Torsten: I wasn’t making a specific point but a general point. Many of the things which ‘just look wrong’ to Voynich experts such as yourself do so because they seem to violate one or more of the strong structuring “rules” which appear to dominate Voynichese. Words such as ‘qoiin’ violate so many of the structuring rules simultaneously that suggesting corrections becomes extremely hard (for what it’s worth, this seems more likely to me to have been a copying omission error for ‘qodaiin’), but in the (I believe) majority of cases, the scribal copying errors aren’t quite as abstruse as that.

    My point remains that if the nature of Voynichese is that it is strongly structured (but miscopied), then we already know more than enough in a very large number of cases not only to identify likely scribal errors but also to offer likely corrections. None of this would be easily apparent for someone arriving at Voynichese from cold.

  12. Nick, your:
    “For example, qtoy could be qoty, qdain could be qodain, oiin would be aiin, etc.”
    is precisely what I meant with:
    “We can at best guess about some possible cases”.

    Young Kim: prize money is a bad idea and I will not have anything to do with it.

  13. Nick: My point was about the impact of your corrections for the text in general. When you start correcting words you should define when it is allowed to correct a word. Otherwise you will end with a text full of similar or equal words. Even without any corrections sequences like ‘qokeedy qokeedy qokedy qokedy qokeedy’ in line f75r.P.38 are at least strange.

    BTW: It is possible to find an exception for nearly every rule you are able to define for the manuscript. In my eyes this is a rule for the Voynich manuscript.

  14. Mark Knowles on June 4, 2017 at 12:05 pm said:

    Young Kim: A prize is something I have thought about. Please email me at to discuss this further

  15. Torsten: I don’t believe that we can know for certain whether any corrections we can suggest are actually correct. However, what we can say in a very large number of cases is that “[X] appears malformed according to our best current understanding of how Voynichese appears to work, and the word as it was originally formed probably looked like [Y]”.

    It may well be that a rectified transcription of this form – though necessarily incomplete and subject to guesswork – may prove to be a substantially better starting point for future analysis than an unrectified transcription.

  16. Young Kim: for the avoidance of any doubt, I think any such prize would be fool’s gold. The only current way to prove a Voynichese decryption would seem to be via the block paradigm, i.e identify a passage whose plaintext appears elsewhere and demonstrate how the 1-to-1 mapping works: and you don’t need a prize to do that.

  17. Nick: The Voynich manuscript only contains similar words. How can you say that one similar word is malformed and another not?

  18. Mark Knowles on June 4, 2017 at 2:25 pm said:

    Nick: So am I to understand that you are saying that if one can find no such passage elsewhere it would be impossible to prove a Voynichese decryption? (You use the word “current” and I am not sure what you mean by that here.)

  19. Mark: because there seems to me to be a high likelihood that some kind of abbreviation is going on, any decryption will very likely involve some kind of creative reconstruction of that-which-has-been-abbreviated. And the only way to prove that is correct would be to identify a parallel text with the same contents.

  20. Torsten: though it contains similar words, the statistics of their occurrences are far from flat.

  21. Mark Knowles on June 4, 2017 at 2:38 pm said:

    Nick: Surely there are other ways to prove, for all practical purposes, a decryption. The individual needs to very precisely and unambigiously describe their method of decryption (pretty much as an algorithm) and provide the text of a full decryption of the manuscript using this method. Then cannot someone check the decryption by selecting a random passage and then apply the method described and verify the text matches. Obviously the resultant text must be meaningful, intelligable and readable. One would also most likely expect the text to have some consistency with the images and overall consistency throughout the manuscript.

    Sure one could make theoretical objections to this as one could with the block paradigm. Such as:

    It is conceivable that the author encrypted yibberish.
    It is conceivable that the is more that one consistent plausible solutions to the manuscript. (I think this vanishingly small.)
    The manuscript is a hoax.

  22. Mark Knowles on June 4, 2017 at 2:43 pm said:

    Nick: In reply to your last comment as I only saw that after I posted my previous comment. I am also of the opinion that abbreviation has occurred maybe even radical abbreviation. However when you mention creative reconstruction then we have to be aware of reasonable levels of imagination/creativity. I think abbreviation has to and would conform to some pattern of abbreviation and therefore could fit within that framework.

  23. Mark: the issue is one of proof, specifically how to prove that a given decryption is correct. For meta-theories (such as hoax or gibberish), different kinds of proof would be needed.

  24. Mark Knowles on June 4, 2017 at 3:00 pm said:

    Nick: Given my understanding of your block paradigm; correct me if I am wrong as I have not studied it in detail:

    I would say it certainly makes sense to compare the Voynich with other texts to see if one can identify commonalities. And given that your Block Paradigm approach seems eminently sensible if you can find such a parallel text.

    However it seems to me that potentially proving a connection in the way you suggest could be fraught with difficulties. One could believe there is a clear parallel between a part of the Voynich and another text, however one could easily be wrong. For example Rene mentioned to me a drawing in John Bunyan’ s Pilgrim’s Progress which had a similar appearance to the Top Right rosette on my “favourite” page. Now he mentioned it illustratively, but someone could make a literal parallel. There could be only a partial parallel between two texts.

  25. Mark Knowles on June 4, 2017 at 3:16 pm said:

    Nick: I think there is no way one could prove the hoax theory for certain as there could be just one line in the whole manuscript which is meaningful and the rest could be nonsense. Proving that there is no such line would be impossible I think. So I feel sorry for the hoax theorists as they have an up hill battlements of demonstrating by statistical means or others that it is extremely likely to be hoax.

  26. Mark: the only way someone could prove the hoax theory is if they had historical evidence that it was a hoax, i.e. a 15th century letter describing making it or selling it. Which is possible but… somewhat unlikely.

  27. Mark: what I’m suggesting is that having a block match would mean that the nature of the mapping would become extrenely clear within even a line of text, and almost beyond any doubt inside two or three lines. Whereas a carefully-manipulated decryption could be sustained for perhaps even pages without it becoming any clearer whether or not it was correct.

  28. Mark Knowles on June 4, 2017 at 3:31 pm said:

    Nick: For clarity I take your use of the word proof here to mean of extremely high likelihood i.e. virtually certain. (e.g. 99.99% probability or whatever) Ultimately one will have to exercise human judgement as whether a claimed “decrypted” text is nonsensical or meaningful.

    To quote Rene in an email he sent me:

    “While some time was spent to define some criteria, in the end, it will be simple. The correct solution will be recognised immediately.” (I hope that have not misrepresented him with this quote, but I don’t think I have.)

    I don’t have quite the level of confidence he expresses in this quote, but I am inclined to the view that it should not be too difficult to spot the correct solution. I felt it important to have criteria and a testing procedure to be fair and rigourous.

  29. Mark Knowles on June 4, 2017 at 3:40 pm said:

    Nick: I agree that having a beautiful neat block match would be wonderful. However it seems to me that we have to contend with the possibility that there is no viable block match out there. I don’t object to people looking for one of course not. If you or anyone else has an idea where to look for such an example I think that is a worthy cause. I would think it unwise to pin all one’s hopes for proof on that.

    I must confess that I have very little familiarity with medieval herbal or astrological manuscripts and maybe there are many very closely parallel manuscripts; if so they should be invaluable in deciphering the manuscript.

    I agree that Astrological charts look like a good place to start looking for parallels.

  30. Mark: while it is entirely possible that an extraordinarily clever researcher / analyst could propose the correct decryption (and for all the right reasons), I think they would (because of what I suspect will be an interpretative component of any decryption) still have a further mountain to climb to find a way to prove the correctness of their decryption.

    In the end, just about the only way of doing this that I can currently see would be to find a parallel text and use that to demonstrate how the two halves mesh together. The big (and, I think, novel) point about the block paradigm is that it proposes we should instead use historical and textual analysis tricks to identify the block first, and only then work back from there.

  31. Nick: If I understand you right you argue that you want to use the frequency to decide if something should be replaced or not. There are 196 words for which you suggest to replace ‘oiin’ with ‘aiin’. There are only nine words containing a sequence ‘chse’. If you want to replace ‘oiin’ with ‘aiin’ I didn’t see any good reason for not replacing ‘chse’ with ‘chee’.

    BTW: Of course the statistics of their occurrences are not flat. Have you noticed that ‘aiin’ is more frequent then ‘ain’ and that also ‘oiin’ is more more frequent then ‘oin’? This is not a coincidence. Therefore the frequencies must build a geometric series.

  32. Torsten: no, you didn’t understand me right. I want to use the nature of Voynich’s strong structuring in order to help predict corrections to mistakes that appear to have been in the text right from the start, rather than to just use instance counts or geometric series.

    As far as chse and ches goes, I’m sure we could both easily find some cases where we might well be looking at an sh where the horizontal bar has been accidentally omitted: there are plenty of other non-obvious things going on on f8r as well that make me wonder whether this might be some kind of transitional page. But all of this is far beyond the scope of what I can reasonably debate in the small margins of a comment field. :-/

  33. Mark Knowles on June 4, 2017 at 4:26 pm said:

    Nick: The question of an interpretative component of any decryption is an important one. We can both speculate to what extent there will be one or not. When it comes to single word isolated labels I would think it much easier to get around questions of interpretation.

    I agree that with enough interpretation a random jumble of words could be made to have meaning. So again in the end some human judgement will have to be involved.

    I have no problem with the block approach especially if you can provide me with text identifications on that basis it would be great. In the meantime until a suitable block is found we must work on the basis that one might not exist.

    In fact I believe what others have suggested is a small(3 word), and I think justifiable, “Block paradigm”. (The Europe, Africa, Asia bizarre circle divided into 3. Along time ago I thought this circle represented Venice, but now I accept this very peculiar medieval representation of the continents.)

  34. Nick: If I understand you right you decide by your understanding of the nature of Voynich structuring what a copying error is. Is this correct?

    BTW: The word ‘chshy’ only occurs twice. In the same way it would be possible to argue that ‘chshy’ is a miscopied ‘chsey’. Therefore such an explanation did explains nothing.

    BTW: The idea behind the geometric series argument was that it is not possible to make a black or white decision based on numbers of a geometric series.

  35. Torsten: the core idea is to build up a (completely optional) set of adjustments to the basic transcription that attempts to correct sequences that seem not to fit the core Voynichese word template, and which also seem to have a straightforward alternative reading.

    It would not be hard to make up a decent-sized list (perhaps as many as a thousand? I don’t actually know) of these: and it would also be easy to build up a list of words that don’t “look right” but for which there is not an obvious alternative reading, such as the chshy/chesy example you give (though I doubt this latter list would be quite as large).

    It’s perhaps important to say that for me this isn’t simply about word instance counts in the way that your Voynich study has followed: rather, I think we should be able to build up Markov diagrams built around parsed tokens, particularly for different clusters. But that is a substantial topic for another day.

  36. Nick: The Voynich manuscript contains beside the word type ‘chedy’ also the word types ‘chey’, ‘cheedy’, ‘ched’ and ‘chsdy’. All this word types are typical for Currier B and rare or missing in Currier A. Someone with the core Voynichese word template for Currier A in mind would probably dismiss even an instance of ‘chedy’ (the third most frequent word type) as misspelled version of ‘cheody’.

    BTW: My word grid for the Voynich manuscript is only the most simple way to describe connections between similar word types. You can call a common glyph combination a token and build Markov diagrams for this tokens. But this didn’t change the fact that you still describe common glyph combinations and similar word types containing this glyph combinations.

  37. Young Kim on June 4, 2017 at 8:23 pm said:

    Nick Pelling, Rene Zandbergen, and Job: I am really sorry if I offended you in any possible way with my previous post and I sincerely apologize for that.

    There are the pros and cons in my proposition and I didn’t think much about the cons of the idea of open challenge with a prize, to be honest. I don’t know if people get a different impression when it was called Challenge Award rather than just Challenge with a prize. The award doesn’t have to be necessarily in a monetary form though. It shouldn’t be a matter, I guess, since I know there are open challenges to the public offering a prize in various academic fields. Personally I don’t think the award money on Nobel Prize tarnishes the spirit of the honor.

    My intention was to encourage people to come forward with their works on the Voynich Manuscript in a tangible form. Since we have been hearing many self-claims, I think it is time to see them in front of our eyes. I thought the translation of a paragraph long Voynich text could be enough to show the proof of acceptance. Or a sentence long Voynich text would be acceptable if how Voynich sentences comprise a paragraph can be explained. Maybe it is a bad idea in the first place, or maybe we are not there yet, but I was hoping such an open challenge with or without a prize would motivate people in a constructive way.

  38. Torsten: I will be discussing clusters in Part #3.

    The point about Markov chains is that – if you do them properly – they encapsulate a great deal of information in a very compact way, far more than just frequency counts and word grids.

  39. sequences like ‘qokeedy qokeedy qokedy qokedy qokeedy’ in line f75r.P.38

    Obviously qokeedy means “buffalo” :-).

  40. Nick: The problem with Markov chains is that they are based solely on its present state. Therefore they are only suitable on glyph level but not on word level for the Voynich manuscript.

  41. Torsten: that’s not actually true. A full Markov chain would include every node in a network, you should only collapse those nodes together (or merge nodes via hidden nodes) that share the same behaviour. The answer is much more complex than you think.

  42. Nick: You misunderstood my argument. The problem is not the Markov chain the problem is the missing word order for the Voynich manuscript.

    BTW: If you think that the Voynich manuscript is to complex for the auto-copying method you should explain your point of view.

  43. Torsten: as long as you understand that Markov states aren’t necessarily the same as tokens, we’re doing OK here. 🙂

  44. Young Kim: no offense whatsoever 🙂

  45. Mark Knowles on June 5, 2017 at 9:21 am said:

    Young Kim: I have been exploring the idea of prize money.

  46. The Voynich MS was solved twice last week.
    One solution has not yet been published.
    The other is on a web site. I’ll ask the author if it’s OK to provide the link. It’s in Slovak….

    Shall we just split the prize money between them?

    Seriously though. It seems that both Nick and myself would be considered to be ‘judges’ in this. Now both of us have clearly stated that this is a bad idea, and I can say that we both know the ‘world of Voynich “research”‘ quite well.
    This judgment is being completely ignored.

    That kind of proves the impossibility of the idea.

  47. J.K. Petersen on June 6, 2017 at 8:20 pm said:

    Young Kim wrote:

    ” I thought the translation of a paragraph long Voynich text could be enough to show the proof of acceptance. Or a sentence long Voynich text would be acceptable if how Voynich sentences comprise a paragraph can be explained. ”

    From my experience, a sentence is not enough. I’ve been able to extract whole phrases and even a few sentences in a small number of languages with cohesive systems that are documentable, which do not rely on anagrams, and which DO generalize to certain other parts of the manuscript. But I know they are not solutions. Besides not generalizing sufficiently, I know that the results are simply the consequence of matching languages to passages in the VMS that exhibit similar structure. There is enough text in the VMS that one can find inter-relatable patterns for almost any short phrase.

    Many of the claimed solutions look plausible if the methodology is flexible enough to allow subjective interpretation during some part of the process. Anagram solutions that do not follow a specific pattern for unraveling the characters, for example, are highly suspect, along with those that have some help from Google Translate to wrestle them into grammatical form that doesn’t actually exist in the translation.

    Take Strong’s solution as an example. It looks plausible and won him publication in academic journals, but if you look at his notes, you’ll find that half-way through, he abandoned defensible methodology, for what he claimed was a Trithemius cipher, and applied subjective selection to cherry-pick words that sounded good together and related to some of the drawings. Without careful analysis of his notes, one cannot see the flaws in his method.

    So, every step of the methodology has to be scrutinized along with the translation (and not everyone claiming a solution seems capable of clearly describing the method).

    It would probably be a monumental task, and possibly a colossal waste of time, for unpaid judges to go through thousands of ill-conceived or downright crackpot solutions that would be submitted if prize money were involved. Not to mention the difficulty of assessing “solutions” in dozens of different languages.

  48. Davidsch on June 16, 2017 at 2:16 pm said:

    This message is for the person(s) that will make a new transcript, if any.

    Is it possible to include 3-dimensional information of the letter or word or line itself? That would increase the shelf-life of EVA 2.0

    For example the word Fachys, is on location (x,y, folio) 20,60 on f1r.

  49. Davidsch on June 16, 2017 at 2:21 pm said:

    Something else I forgot to mention. I am incapable of reading everything there is, so I do not know if this has been mentioned ever before:

    The EVA P and EVA F seem to exist in two flavours:
    one with a straight left finish, and one with a left curl that seems to be a ligature of eva [c] + long eva [q] + eva [L]

    All together they make 4 characters to transcribe for the current F and P.

Leave a Reply

Your email address will not be published. Required fields are marked *

Post navigation