27Jan 2018

Have Kondrak and Hauer finally revealed the Voynich Manuscript’s secrets? (Errrm… no, not really, sorry. But…)

Thanks to Newsweek, Fox News, The Daily Mail and The Independent [*sigh*], some techy Canadian Voynich research is currently enjoying its day in the media sun. (Hint to authors: sorry, but based on recent evidence, it would seem that you have ~48 hours to get your next funding request submitted and approved before everyone currently cheering starts booing.)

CompSci professor Greg Kondrak and graduate student Bradley Hauer presented their research at the 2017 ACL conference, and their paper “Decoding Anagrammed Texts Written in an Unknown Language and Script” appeared in Transactions of the Association for Computational Linguistics Volume 4, Issue 1, pages 75–86 [though the PDF is freely downloadable, at least for now].

From the press coverage so far, you might think that they had CARMELed the Voynich (i.e. thrown a tame supercomputer and some kind clever-arse AI libraries at the problem): for, as the media incessantly repeat at the moment, All Human Problems Will Inevitably Yield To The Scythed Mega-Bulldozer That Is AI. But… is any of that true? Or useful? What’s actually going on here?

Behind the Kondrak and Hauer headlines

The initial question is obvious: what did Kondrak and Hauer actually do to try to crack the Voynich’s mysterious secrets that (they thought) nobody else had tried before? A quick snoop reveals that Bradley Hauer is a pretty smart crypto cookie: the simple substitution cipher solver presented in his 2014 paper “Solving Substitution Ciphers with Combined Language Models” outperforms many competing academic solutions. It does this by using both letter statistics and word lists at the same time (a) to solve Aristocrat cryptograms (i.e. ones where you know where the word boundaries are) even under mildly noisy conditions, and (b) to solve Patristocrat cryptograms (i.e. ciphertexts without spaces, though the recursive approach used to turn Patristocrats into candidate Aristocrats seems somewhat heavy-handed), before finally moving on to trying (unsuccessfully) to reproduce the kind of deniable encryption loosely proposed in Stanislaw Lem’s (1973) “Memoirs found in a bathtub”.

And here’s what Hauer looks like in real life:

So what happened before the Voynich paper was even written was that Hauer had built up a lot of software machinery for solving nicely-word-boundaried simple substitution ciphers at speed, and where some kind of mild text mangling had optionally taken place. And so it should not be a surprise that he carried this technology and approach forward, insofar as the 2016 paper tries to solve Voynichese as if it were a nicely-word-boundaried simple substitution cipher that had had its text mangled via anagramming plus optional abjad-style vowel removal. Given that as the paper’s founding presumption, all it is trying to do is evaluate which plaintext language was used if that entire presumption just happened to be correct (oh, and the transcription used was accurate).

Incidentally, the Voynich corpus used was 43 pages (“17,597 words and 95,465 characters”) of Currier-B text in the Currier transcription that one or both of Knight & Reddy had supplied, but the authors did not seem to have questioned the reliability or parsing choices behind that particular transcription. (More on this below.)

Voynich anagramming

Unlike Stephen Bax’s well-known Voynich 2014 paper (which began by gleefully flipping the bird at nearly all previous Voynich research), Kondrak and Hauer’s Voynich paper begins by covering what they consider related Voynich work (section 2.1) in a level-headed, if somewhat brief, way. The most relevant source they have for the notion that we might be looking at anagrammed text is Gordon Rugg’s 2004 paper: this floated the idea that there might be a similarity between alphabetically ordered anagrams (‘alphagrams’) and what we see in the Voynich Manuscript’s text.

Yet much has already been written about Voynich anagramming beyond this, not least William Romaine Newbold’s monstrously tangled ‘decryption’ (*shudder*). More recently, Edith Sherwood claimed both that it was a young Leonardo da Vinci who wrote the Voynich Manuscript, and that the Voynich text was written in anagrammed Italian (though so far she has mainly only tried to reconstruct Voynich plant names using her proposed scheme). As I pointed out in 2009 this seems extraordinarily unlikely to work in the way she proposes.

Arguably the most interesting previous Voynich research into anagrams (again, not mentioned by Hauer) has been that of London-based researcher and translator Philip Neal. In a (now long-lost) page he posted many years ago on the late Glen Claston’s voynichcentral.com website, Philip proposed:

Here is a transformation of plaintext into ciphertext which explains certain features of the Voynich “language”.

1. Divide a plaintext into lines
2. Sort the words of each line into alphabetical order
3. Sort the letters of each word into alphabetical order

1. one thing led to another thing last night
2. another last led night one to thing thing
3. aehnort alst del ghint eno ot ghint ghint

The result has some of the statistical properties of the Voynich text.

A. The frequency distribution of words and letters is the same as in the natural language plaintext, but the distribution of two-letter groups and two-word groups is significantly altered.
B. Words at the beginning of a ciphertext line tend to start with letters at the beginning of the alphabet. Compare the high frequency of Voynich “d” at the beginning of a line.
C. If a letter near the end of the alphabet has a tendency to be word-initial in the plaintext (e.g. German “w”), it will have a strong tendency to be the last word in a line. Compare the high frequency of Voynich “m” at the end of a line.
D. The ciphertext versions of frequent words will tend to cluster together in a line. That is, where a word such as “thing” occurs twice in the plaintext line (as in the above example) the two word sequence “ighnt ighnt” will occur, but “ighnt” may also occur elsewhere in the line as an anagram of “night”.
E. A one-letter word of ciphertext can only be an anagram of a single word of plaintext (“a” can only be an anagram of “a”) and a two-letter word of ciphertext can only be an anagram of two possible words of plaintext (“et” can only be an anagram of “et” and “te”). This means that you cannot have a ciphertext line of the pattern “… i … i … ” or of the pattern “… et … et … et …”. This principle largely holds good in the Voynich text: there are only six exceptions in the corpus of Currier’s language B.

To his credit, Philip then immediately pointed out some problems with this suggestion:

1. Voynichese words do not conform to a strict alphabetical ordering of letters (there are quite a lot of words of the pattern dshedy).
2. Voynichese words have a strong tendency to contain only one instance of a given letter, unlike any obvious candidate language for the plaintext.
3. The enciphering described is not unambiguously reversible (however I think it would work as a private aide-memoire, or as a means of establishing priority like Galileo’s well known anagram announcing his discovery of the phases of Venus)

(Philip has since instead proposed a possible grid-like constraint on the position of Voynichese letters within Voynichese ‘words’, though problems with that alternative explanation remain.)

Incidentally, Philip has also pointed to a number of places within the Voynich Manuscript where entire lines appear to have been written in a non-one-after-the-other way (i.e. unexpected line transpositions): while nobody has yet come up with a powerfully convincing explanation for the presence of “Neal keys” (sections of text typically delimited by pairs of single-leg gallows) in the top lines of pages (typically embedded ⅔ of the way across). He is a sharp observer, and these anomalies are all inconsistent with the widely-held presumption that the text we are looking at here is completely unmangled.

Ultimately, though, it remains a sizeable step (or three, or indeed more) to go from anywhere here to Hauer’s presumption that what we are looking at is straightforwardly anagrammed text in a conventional European language, whether abjad or not.

The actual Voynich research gap

If asked for the single largest methodological problem with Voynich research, I would point to the way that Voynich researchers tend to make a series of unfounded assumptions:
(a) the transcription they are using is perfectly reliable;
(b) the way that they parse that transcription (i.e. into tokens) is correct – there are many hidden linkages here which are each probably sufficient to derail any decryption attempt;
(c) the candidate plaintext languages they consider are genuinely representative of the Voynich Manuscript’s plaintext;
(d) no other textual transformations are present;
(e) the putative hypothetical transformation that they just happen to have plucked from the air and which they are testing is precisely that which is present in the Voynich Manuscript; and
(f) the output of their reverse transformation will be straightforward text that can be read and marvelled over by historians.

In the case of Kondrak and Hauer, I hope it should be clear that they have fallen foul of every one of these issues in turn: and their paper is all the worse for it. It is one thing to note in passing that Esperanto’s “extreme morphological regularity […] yields an unusual bigram character language model which fits the repetitive nature of the VMS words” (p.83), but it would be quite another to point out that this might easily have arisen from the way that Voynichese needs to be parsed in order for it to make sense: and it is this apparent lack of perception of the practical difficulties that all Voynich decryptors face that devalues the genuinely good work that went into their paper.

What particularly frustrates me is that in spite of these many issues, there are plenty of ways Voynich researchers can make genuine progress towards understanding what is going on: but, rather, they instead persist in trying to airball their own personal Voynich match-winner from the other end of the basketball court. They seem seduced by the glamour of being The One Who Solved The Voynich, instead of getting on with the graft of making a difference to what we know. 🙁

Yet computational linguistics has such a rich toolbox (of which CARMEL is merely one small screwdriver) that it surely has ample capacity to at least try to bridge all the actual research gaps that people are falling into, e.g.:

* What is the right way to parse EVA into tokens? (e.g. is EVA ‘or’ two tokens or one? is EVA ‘cth’ three tokens, two tokens, or one? etc)
* How does Currier A map to Currier B? And what about all the subtypes of each of these?
* What are the differences between them and “Currier C”? (Rene Zandbergen’s term for labelese)
* Can we determine whether line-initial letters are likely reliable or unreliable?
* Are words abbreviated (e.g. is EVA y some kind of truncation symbol)? If so, are A and B abbreviated in exactly the same way?
* etc

If people had the intellectual good sense to stop trying to fly over all these separate hurdles all at the same time in a Steve Austin-style 100m leap of misplaced faith, we might start to make real progress. However, even when researchers do have the necessary brains to make progress (as Hauer clearly has), it seems they have insufficient strength of mind to not be tempted by the glamour of the big ticket “Researchers Crack Voynich Manuscript” headline. 🙁

Posted in: Voynich Manuscript

98 thoughts on “Have Kondrak and Hauer finally revealed the Voynich Manuscript’s secrets? (Errrm… no, not really, sorry. But…)”

Mark Knowles on January 27, 2018 at 3:01 pm said:

What I don’t understand, it seems like, they have only translated the first sentence, from what is implied in the articles, and that translation sounds a little implausible. If you have only translated the first sentence it seems a little premature to declare victory.
farmerjohn on January 27, 2018 at 6:28 pm said:

2Mark
And what is sufficient to declare victory? I know it’s tricky question and probably has been discussed elswere but still.
Mark Knowles on January 27, 2018 at 7:17 pm said:

farmerjohn: A good question, but certainly a lot more than one sentence translated; also the translation and its method should have been inspected by others and widely recognised as correct. I could get into more specifics.
Mark Knowles on January 27, 2018 at 7:18 pm said:

farmerjohn: Harder to handle is the situation where someone says it is a hoax.
Philip Neal on January 27, 2018 at 7:38 pm said:

Well remembered, Nick! If I may say so myself, I put the whole idea rather more concisely, and I also posted a sample sorted anagram encipherment from Gulliver’s Travels which somebody cracked in hours.

I am pottering along with the strokes-based transcription, by the way.
J.K. Petersen on January 27, 2018 at 7:54 pm said:

Mark, if you read the published paper, you will see that they didn’t declare victory. The media spun it that way. They don’t even claim to know Hebrew, the language that the AI software pointed toward when processing whatever transcript was fed into the system.

.
But even if it was the press and not the researchers who blew the story into a “solution”, I’m still in agreement with Nick on pretty much every point he posts above…

Assuming what the VMS is (or might be) and attacking it from that angle is not productive and would work only by a stroke of luck. Studying WHAT IT IS rather than shoehorning it into a set of assumptions is more likely to yield useful information.
Rene Zandbergen on January 27, 2018 at 8:04 pm said:

The worst part about the paper is the media interpretation. I find the paper quite OK. They did a lot of interesting work. Of course, anagramming is a mine field but at least they are reasonably circumspect about the conclusion.

The media have not understood this.
(I have this fantasy where, when the media declare once again that the solution has been found, they would start by admitting that the previous hype must have been wrong. Oh well….. )

The study is done on the basis of some assumptions, and I agree with Nick that these assumptions aren’t likely to be correct. I mean the anagramming of course. Still, it is a valid approach to work on the basis of some assumptions as long as one is clear about it.

The main weak points in the paper are, for me:
– the use of Google translate. For me unacceptable, but it is also a AI tool, so perhaps this is fine in that context (???)
– the reliance on one particular transcription alphabet.

The line that has been translated is not real Hebrew, but of course Google will make something of it. (Too bad it still required “corrections”).
And what then comes out really does not make a lot of sense as the opening line for a book. Or any other line.
Josef Zlatoděj Prof. on January 28, 2018 at 9:34 am said:

Ants and Nick. 🙂
Kondrak writes in the article. The one who wants to underatand the handwriting voynich. He must know perfectly hebrew. And of course they also have know history.
I’d try to fiy Kondrak. In the first place,every ant should know the Czech language. Why ? Because it in written on each side of the manuscript : it is written Czech. 🙂 This is very important. 🙂
Of course, he must also know history.
He also needs to know the key. 🙂

I can write responsibly here. No computer is able to decipher the handwriting 408. 🙂 Why ??? For example, the computer will never be able to detect a deceptive character. 🙂
Josef Zlatoděj Prof. on January 28, 2018 at 11:47 am said:

Ants. Nick and Zandbergen.
What has a handwriting in common with Hebrew !

🙂 Substitute cipher ! 🙂

And of course, a few words that Eliška uses.
Those words are :
1. am = means the people.
2. Ab = means father.

Otherwise, the manuscript is written in the Czech language. From beginning to end.
Rene Zandbergen on January 28, 2018 at 12:38 pm said:

Briefly to Philip:

I am interested in what you are doing with the stroke-based transcription.
Please feel free to contact me. I think there would be an advantage in using the same codes for identifying each text item (‘locus’) as I have proposed on my transcription page, and the same file format. Not that there are a lot of tools around using this format, but these may still come.

Apart from that, there is a new link for the electronic version of Shailor’s catalogue. It was down for a year, as you also indicated on your web site.
More information, see here:
http://www.voynich.nu/refs.html#brbl
(both second and third bullet)
farmerjohn on January 28, 2018 at 12:42 pm said:

2Mark
It’s not very fair to refer to “others”. They should have some criteria too:)
D.N.O'Donovan on January 28, 2018 at 3:52 pm said:

Nick, Philip,

Catalogue of medieval and renaissance manuscripts in the Beinecke Rare Book and Manuscript Library, Yale University
by Beinecke Rare Book and Manuscript Library; Shailor, Barbara A., 1948-

Publication date 1984

at the internet archive

https://archive.org/details/catalogueofmedie03bein
Sorry folks on January 29, 2018 at 4:45 pm said:

You can decode it (BUT NOT READ IT) 🙂
Jeff Haley on January 29, 2018 at 6:16 pm said:

Not another ‘solution’ please. I have been out of this for so long I had forgotten about most of the work done on the VMS. The late Glen Claston?
Rene Zandbergen on January 29, 2018 at 6:54 pm said:

Nothing new under the Sun:

https://www.thesun.co.uk/news/5446568/mysterious-book-filled-with-ufo-prophecy-is-finally-decoded-by-ai-after-humans-failed-for-600-years/

Now waiting for the Sunday Sport….
Jackie Speel on January 29, 2018 at 7:43 pm said:

The VM #looks# as if it has been written in a smooth running hand rather than painfully transcribed (as we might Cyrillic or Blackletter if we are unfamiliar with them, or ‘anagramming in our heads’) – therefore any ‘translations’ should show the same properties.
xplor on January 30, 2018 at 1:37 am said:

Just because they stepped in a pile of minutiae
is no reason to declare the Voynich solved.
A real solution will tell us who and why.
Was it a secret society or a barbarian christian
religion ?
Mark Knowles on January 30, 2018 at 7:41 pm said:

Journalists will be very disappointed when the Voynich reveals its secrets as they won’t be able to publish stories every 2 months saying that it has been solved.
Peter M on January 31, 2018 at 11:42 am said:

I have looked at 2 more reports in German. So I have based on the 3 reports my personal opinion.

They are as much in the dark as before. 🙂 You only used the system program which solved the Kassler cryptogram, but that is a completely different book. Everyone is allowed to believe what he wants, and I do not want to make his ideas magical. Just hard to believe for me. Only Hebrew, one writes it from right to left, and now with symbols where in Hebrew does not occur, and the whole still encrypted by the reversed letters or omitted!?! For me, too far shot over the mountain.
D.N.O'Donovan on January 31, 2018 at 1:52 pm said:

Peter M.
“Hebrew, one writes it from right to left, and now with symbols where in Hebrew does not occur, and .. reversed letters and omitted.”

Hebrew letters were employed in ways for which we have equivalent use of the Roman alphabet: to render the sounds of some other language. When the first printed texts from Arabic were printed in Europe, Hebrew letters, not Arabic, were used. Jews who lived in Persia might use Hebrew letters to render Persian, or might speak a dialect which reversed some sounds.

All this occurs in uses for the Roman alphabet too. When we render Chinese in Roman letters, we change the direction of the script from vertical to left-to-right, and we reverse the direction of Arabic when we render that in Roman letters.

To render some languages we have to create new symbols, too.

None of these things, in themselves, made the findings less probable.
Davidsch on January 31, 2018 at 3:28 pm said:

One can not ask for professional expertise to look at the Voynich manuscript if
the researchers that creates papers such as these young amateurs that deliberately seek attention are followed by newspapers which send populist horny novice journalists to write about it.

On the other hand, most interested people, such as the reactions here, are also from amateurs, most of the time with a very limited short memory.

Thus, the main problem remains: the knowledge of the concerned people on almost every aspect of the Voynich manuscript is way too low. Whether we look at the characters in the ms, the images, the pre-renaissance, religious and cultural movements; in most cases people have no clue whatsoever they are talking about.

At your normal day job you expect people to know what they are talking about, so why is it, that concerning serious Voynich research, most people are accepting talking to an ignoramus?

Perhaps because there really is nobody else to talk to, or because it is almost impossible to know who is who and what it’s background is.
nickpelling on January 31, 2018 at 4:04 pm said:

Davidsch: all the while people (even those who are otherwise sensible) continue to spout things about the Voynich that are more to do with how they would like it to be rather than how it actually is, very little seems likely to change. 🙁
D.N.O'Donovan on January 31, 2018 at 4:50 pm said:

Davidsch,

To get a ‘normal day job’ you read an advertisement describing the job and its limits.

How should be described ‘the Voynich job’?

In the outside world, a man trained as a mender of trains doesn’t apply for a job as CEO of a bank. His intelligence might be up to it, but his c.v. won’t fit the ad.

What we have in Voynich studies is a janitor appointing or firing typists, and typists’ gossip deciding if the company’s stock-broker should be sent to coventry, and the boss’ secretary maneuvering to have the next-door neighbour’s underbutler fired, so she can eat cake.

We’re in Alice-in-Wonderland country when things reach the stage where online Voynicheros complain because someone who has nothing to do with Voynichland publishes a paper about a fifteenth century manuscript held at Yale, and is reported in the papers.

Why do you care?
Mark Knowles on January 31, 2018 at 5:18 pm said:

The simple point to me is that in some sense there are no experts in the Voynich manuscript in a strict sense. Clearly some people have more knowledge of certain or all aspects of the manuscript. However when it comes to having theories about the manuscript most or all of them whether specific or general remain unproven in the strict sense and so one should still even be wary of any highly knowledgeable person’s theory.

It seems this must be why many academics steer well clear of the Voynich as a recognised failed theory could damage their reputation in their profession and as we well know there is a long history of failed Voynich theories. So in short the risk of a negative impact of their career is too great.

So we are left with “amateurs” who have no professional reputation to lose by a failed Voynich theory.

Lots of the general public are obviously enthralled by the Voynich and for some people, specifically, the more fanciful explanations: aliens, mayans, witches, elixir of life and so on. The media, like any business, is there to make money, so certain media outlets are very happy to jump on the bandwagon.
nickpelling on January 31, 2018 at 5:49 pm said:

Mark: oooh, you’re veering dangerously close to the kind of anti-science / anti-statistics nihilism most often asssociated with the late Stephen Bax. For him, there was no genuine Voynich knowledge (apart from his) and therefore there could not be any such thing as a Voynich expert (apart from him, of course). This is one of the main reasons I disagreed so strongly (and openly) with him: but because he believed I could not possibly be a Voynich expert, he treated me with disdain and disrespect.

The truth is that there is a continuum of expertise, and a thousand different subareas of expertise to choose from. But none of this stops people making very public fools of themselves. 🙁
Mark Knowles on January 31, 2018 at 6:56 pm said:

Nick: I certainly do not want to go down the “anti-science / anti-statistics”.”

Broadly speaking I think you make a good point. I think the difficulty is knowing who right. My theory is somewhere on the spectrum from utter nonsense to completely true and nobody knows for certain where it lies; the same remains true of JKP theories, your theories, Stephen Bax’s ideas, Diane’s opinions etc. So whilst one of these theories may be very accurate it hasn’t been established as fact. So someone may be much more of a expert than others, but we don’t know who the expert(s) is/are.

More importantly Visconti Letters:

I have been in communication with the archivist and other people associated with the Storia Patria Genova archive, but it has been hard going though I think I am making progress. Basically the archive is boxed up ready to be moved to a different location, as yet undecided. Now apparently they are unsure when they will be able to move it and so reopen it; it could be 6 months. My understanding is that there are over 1000 boxes in the archive. However it looks likely after some polite and friendly discussions they will find the specific box that I am interested in and email me an image of the letter; there are also some other letters with small enciphered parts, so maybe I will see those.

Chasing up the King Alfonso of Aragon, Sardinia, Sicily and later Naples archive in Barcelona indicates that they have deciphered/translated letters, but not the originals. However there is an angle to explore in this regard.

I have found a reference to a letter intercept by the Doge of Genoa, but I have no idea if it survives.

I have been in touch with Professor Francesco Senatore who has studied the Sforza diplomatic archive and he has offered to help.

I have been doing some interesting reading on the chancery of Filippo Maria Visconti and learning more about the politics and diplomacy of the time i.e. I am starting to get to grips with the wars in Lombardy.

So I have been making an effort, when I have the time.

Your suggestion of using the “in cifra” keyword/string has proven to be a very good one. I was using “cifrato”, “cipher”. “cifra” on its own brings up lots of references to numbers and figures, as you know, not just ciphers. Some variant of “criptografia” may work, but I am not sure which.

So any more suggestions of keywords would be really valuable. Thanks!
xplor on January 31, 2018 at 10:39 pm said:

The first part of the Voynich manuscript is about plants. Would the book have been used by Cathar parfaits ?
J.K. Petersen on February 1, 2018 at 12:14 am said:

Mark, most of what I post is observations, not theories. I have very few “theories” about the VMS, I still feel that I know very little about it, and whatever observations I have are works-in-progress subject to revision if better information comes along.

When you and I discussed diplomatic ciphers and Latin origin of glyphs, I was not pitting “my theory” against “your theory” as you seem to have perceived. I was making an comment about good research versus bad research. Assuming B and C might be derived from each other, without investigating A (which may be the foundation of both B and C) is not the best way to do research.
Perry D. Edwards on February 1, 2018 at 10:27 am said:

@Nick: Didn’t you also interpret something into the Voynich?
nickpelling on February 1, 2018 at 11:07 am said:

Perry: in “The Curse of the Voynich”, I explored the specific historical possibility that the small books of secrets described by mid-Quattrocento architect Filarete might have been the Voynich Manuscript’s plaintext. But I never claimed to have read a word of it, which is something plenty of shooting-from-the-hip-while-blindfold ‘theorists’ seem quick to claim ‘success’ with.

I happen to think there’s quite a big category difference, but perhaps some might disagree. :-/
Mark Knowles on February 1, 2018 at 11:24 am said:

JKP: without wishing to get bogged down in a discussion I thought was over->

I think what you saying with regard to “observations” versus “theories” is largely semantics. Every sensible person’s theories “are works-in-progress subject to revision if better information comes along”.

Fundamentally you were pitting “my theory” against “your theory” and to deny that is somewhat disingenuous. In your opinion it seems that “good research” is giving a lot of weight to your theory and “bad research” is not. There are many possible A’s such as the Croatian Glatolitic alphabet and other alphabets being the source of B and C, however you weight your A as being something to give special status to and focussed on amongst the other possible A’s.
Bradley on February 1, 2018 at 12:03 pm said:

Note :
Creator must have been very patient -consistency.
Repetition of the use of 7 in the leaves and number of ladies drawn.
Each female is distinctive in facial drawing.
Artist must have had a good knowledge of fractal science in the sketching.
Nowhere is there anything crossed out.

In those days very few had reading ability then why the secrecy.
In those days did the queen have seven helpers?
Perhaps a secret book of monach knowledge.
Bzs5bgradling
ton
Perry D. Edwards on February 1, 2018 at 12:24 pm said:

Nick: Didn’t that mean that you interpret it as a cipher?
nickpelling on February 1, 2018 at 12:37 pm said:

Perry: I’m fairly neutral as to whether it ultimately turns out to be a cipher, an obfuscated shorthand, or whatever. However, we can also say to a high degree of confidence (and have been able to say since the 1950s) exactly what it is not, which is a simple language and/or a monoalphabetic substitution cipher.
Perry D. Edwards on February 1, 2018 at 1:12 pm said:

Nick: Didn’t you publish your own cipher theory in the “The Curse of the Voynich”?
Mark Knowles on February 1, 2018 at 1:48 pm said:

Translation theories tend to be relatively easy to prove or disprove, so the work of Hauer and K is very much amenable to analysis. The problem comes with non-translation theories whether specific or general they often are very difficult to prove or disprove; this is, of course, very frustrating. Once you think that you can translate/decipher the Voynich you should be pretty sure of it as you can expect your theory to be torn apart. I have never really attempted to decipher the Voynich as I have always be inclined to the view that more information is needed to be able to make a serious attempt, whether this be a sufficiently large and reliable crib to work from or some other clues found from other research. My own perspective has been that other research if valid would evidentially lead to the kind of clues which would assist in decipherment.
D.N.O'Donovan on February 1, 2018 at 2:11 pm said:

JKP
We each bring our previous training and area of specialisation to this study, though admittedly many seem to forget that they may contribute most who stay within them.

It’s always an interesting point that a person’s observation is a product of his or her prior knowledge; and (as I’m sure you know) it’s very easy not to see what has not been seen before.
nickpelling on February 1, 2018 at 2:36 pm said:

Perry: “The Curse of the Voynich” highlighted a lot of mechanisms possibly present in Voynichese that might contribute to the specific patterns of behaviour we see, but I was not able to to turn all those mechanisms into a single “cipher theory”. All I was comfortable claiming at the end was pretty much what Tiltman proposed some 40 years earlier: that it seems to be an accumulation of small cipher tricks, artfully arranged.
Perry D. Edwards on February 1, 2018 at 2:58 pm said:

Nick: What is the most characteristic feature of the Voynich text in your eyes?
nickpelling on February 1, 2018 at 3:46 pm said:

Perry: I’d say the most characteristic feature of the Voynich text is that it seems to visually combine being an assembly of only a few different families of shapes (ain/aiin/aiiin groups, the four gallows, qo-, -dy, e/ee/eee, or/ar/ol/al, etc) with a kind of constructional elegance that holds them all together and makes them all seem to fit together as a coherent script.
Mark Knowles on February 1, 2018 at 3:48 pm said:

Diane: I must say there is lots of knowledge I don’t have which I have endeavoured to acquire, such as, it would help me at the moment if I had a good command of both the latin and Italian languages, a good knowledge of early 15th century Italian history particularly with reference to the Duchy of Milan, a background in archival research and how to best find documents that I am interested in within existing archives plus many more areas I would really benefit having more knowledge of. However I have to try to learn what I need to know as I go and in certain very very specific areas become the world expert. So this all feels like a very steep learning curve. I don’t think one should restrict oneself to only what one knows as I think one inevitably needs to build up new knowledge as one progresses.

I have a strong background in Mathematics and I have an assortment of other knowledge, though it is arguable how relevant my experience is. It may seem surprising given my Mathematical background that I have spent very little time trying to decipher/translate the text; as I said earlier, that it is because I am inclined to the view that we need more clues before tackling that problem.

To be honest I have no idea what the qualifications of others are as this is never made clear.

As far as I understand, Rene, for example, has no qualifications for this kind of research, but I see no problem with that as it is so much about learning new things.
J.K. Petersen on February 1, 2018 at 5:11 pm said:

Mark Knowles wrote: “There are many possible A’s such as the Croatian Glatolitic alphabet and other alphabets being the source of B and C, however you weight your A as being something to give special status to and focussed on amongst the other possible A’s.”

Mark, I spent years learning other alphabets, including the Glagolitic alphabet—enough to be able to read basic vocabulary, and there was also a native speaker who looked into it with a fair bit of interest who decided, in the end, that he really couldn’t see that the VMS text was related to Glagolitic.

The ones I spent the most time with, because I thought they had the best potential were Syriac, Coptic Greek, Ge’ez/Amharic, Glagolitic, Arabic, Sanskrit, Armenian, Sanskrit/Gujurati, a couple of other African languages, and some of the Malaysian languages (this is by no means the whole list). I already could read basic Korean, some Cyrillic, a bit of Japanese, and a very small amount of Chinese and Hebrew before I knew about the VMS.

So no, I am not giving “special status” to Latin/some Greek/possibly-some-numbers as the glyph-source for the VMS text. I have observed a high proportion of overlap with medieval Latin and its associated abbreviations that does not occur with any of the other alphabets studied.
Mark Knowles on February 1, 2018 at 7:24 pm said:

JKP: That is all great, but ultimately it still is your theory however good or bad it is. The assumption you make is that it is indisputable objective fact that someone should consider your theory over someone else’s very seriously, because you say you have researched it thoroughly, is not the case.

It is also worth noting that whilst there are many characters in the Glatolitic alphabet you are drawing on a much much wider source of possible characters and given such a wide source to work from you will inevitably find more matches than you would from the Glatolitic alphabet. Why not say the author was a student of all languages and scripts and drew his inspiration from all of them that would give you an even wider net of possible characters to work from?

Don’t get me wrong your “theory” has some significant merit I am sure, but to classify it just as “observations” that everyone should take account of is a step too far for me.
J.K. Petersen on February 1, 2018 at 9:18 pm said:

Mark, it’s not only the shapes that are consistent with Latin. The POSITIONS of the glyphs are consistent with Latin.

Even if you can find similar shapes in other alphabets, those shapes are NOT positionally consistent with the VMS which argues against the creator having drawn “on a much much wider source of possible characters” as you suggest.
Rene Zandbergen on February 1, 2018 at 9:28 pm said:

National Geographic:

https://news.nationalgeographic.com/2018/02/voynich-manuscript-cipher-code-hebrew-europe-spd/
nickpelling on February 1, 2018 at 10:19 pm said:

Rene: has the term “restorator” been used in the last fifty years?
Aziz Bounouara on February 2, 2018 at 4:19 am said:

I think if not a coincidence they are on the right path. since their translated sentence is almost there. it is true that the woman who wrote the manuscript is from jewish origins. and I have proof that their sentence match a bit.
“she made recomandations to the priest , man of the house. to me and to the people” in other way : “ she gave orders to the woman of the house ( place: palace Haramlek) .woman that governs the sacred Harim palace ……to me and the people.
see Azbo videos on youtube.
Thomas on February 2, 2018 at 7:40 am said:

If this approach is intended seriously, the next steps should be to apply the algorithm using a 15th century Hebrew text corpus (I don’t know whether there already is anything like that) and to consult a 15th century Hebrew expert instead of Google translate.
Peter M on February 2, 2018 at 9:14 am said:

@ Rene
I still have a link where you might be interested, if you do not have it yet.
https://transacl.org/ojs/index.php/tacl/article/view/821/174
GeorgeC on February 2, 2018 at 10:14 am said:

How come, when I put their hebrew text into Google translate, I get something completely different?

“And he made a man to smite him, and to his men”

It sounds like something from the Bible to me.
D.N.O'Donovan on February 2, 2018 at 11:24 am said:

Mark

You say,

” I have a strong background in Mathematics and I have an assortment of other knowledge, though it is arguable how relevant my experience is”.

I envy you. My maths is just the chemistries’ and basic programing level. I’d love to see someone more competent produce information on the maths of maths – I mean, what sort of statistics you get from texts like that in the Zibaldone da Canal on commercial problems (14thC) or Michael of Rhodes’ book on navigation, which has 200 pages of maths’ text.

Both books also include transcribed passages in prose and some poetry.

But all the people generating comparative stats seem to use prose only – like the Dec.of Inde. or something.

Another niggling question is about Newbold’s claiming that the text was some sort of five-base cipher or something… sorry, not sure of the details. That’s why it’s a question.

In short – the study really needs maths people and maths attitudes. IMO
Perry D. Edwards on February 2, 2018 at 12:03 pm said:

Nick: Why it is not possible to use the few different shapes for parsing the Voynich text?
nickpelling on February 2, 2018 at 12:08 pm said:

Perry: 25 years back, this was what was incessantly debated online. Every time someone came up with a character-based transcription (such as Currier) that supposedly transcribed Voynichese, someone else would point out all the places (normally hundreds of places, not just one or two) where that transcription failed to represent what we see. And so we ended up with a stroke-based transcription (EVA), that everybody could use while still disagreeing with everyone else’s final (parsed) transcription.

If you still want to reinvent this very old wheel, feel free to do so: but at least be aware of the history. 🙂
john sanders on February 2, 2018 at 1:52 pm said:

It comes to mind that the old Irish Cockney stroker GBS started messing about with a new easy write, non Latin writing system of his own during Queen Vicky’s reign; his estate people settled on Shavian in the end which has some interesting little twists an turns to it. Bernie was also a life long aquaintance of Wilfred’s wife Lily ‘the Pink’, which of course is hardly co-incidental, baring in mind their similar ages and almost identical upbrings e.g art, music, literature, politics, vegetarianism and mainstream non conformity.
Rene Zandbergen on February 2, 2018 at 3:02 pm said:

GeorgeC, I get:

“And the priest made a man for him to his house, and to his men”

After I fed it with this (the font is not so nice):

המצות ועשה לה הכהן איש אליו לביתו ו עלי אנשיו

This is, of course, the set of words that the Hebrew speaker considered not sensible, and I can see his point even from the output of GT.
Perry D. Edwards on February 2, 2018 at 3:19 pm said:

Nick: Didn’t that mean that the real problem is to describe the Voynich text as it actually is?
xplor on February 2, 2018 at 4:57 pm said:

The Voynich is a vegan book , maybe the Essenes.
Mark Knowles on February 2, 2018 at 5:52 pm said:

xplor: I have heard that it was written by a rogue fruitarian, experimenting in the dark arts of the vegetable.
Peter M on February 2, 2018 at 6:33 pm said:

But that does not explain why some characters only appear in the back and never in the middle, and the front does not look any better.
This cryptogram was cracked with the PC system. It’s a simple 1 to 1 system. It looks Arabic, but is written in German.
Basics it does not matter what the characters look like, you just give each character a number. Then you put the numbers in the order of the text. It is a bit longer, but there is enough paper and pencil. (Postcard encryption, hobby around 1850-1950)

I did not find any reference to Hebrew in the VM. Neither in religion, culture, or drawing still, even in the symbols nothing.
Therefore, Hebrew is difficult for me to question.
But I’m happy to teach you a better one.

There is a reason in the story where it would justify. Exactly in the time of origin of the VM began persecution of the Jews. Ironically, by this King Albrecht II where the crown carries where in the VM is watching.
Encryption Hebrew to hide the origin? That would be a reason for me.

Kryptogramm Kessler
https://www.welt.de/wissenschaft/article121529078/Die-Kasseler-Geheimschrift-ist-entschluesselt.html
J.K. Petersen on February 2, 2018 at 11:02 pm said:

Google Translate is designed to try to make sense of less-than-perfect text.

One normally wouldn’t find “priest” written that way in that part of the sentence and the first word could be interpreted as matzo (unleavened bread) or command/commandment, depending on how one adds in the vowels. איש is more straightforward (man, husband).

I only know a tiny bit of Hebrew (mostly Biblical Hebrew), but reading the text more literally (without Google Translate trying to inject meaning), one could just as easily read it as, Matzo made (caused to be) darker/duller, to her man/husband, to his house, on/to me his men/people. So, a fair amount of subjective interpretation is necessary for it to make sense.

I’d be interested in seeing a more literal translation by someone skilled in medieval Hebrew.
Karl on February 3, 2018 at 1:06 am said:

I’m going to keep beating this particular drum in the vain hope that the message gets heard by people with funding wanting to do serious research on the VMS text:

The tall pole is *not* clever new crypto methods that can handle short ciphertexts. *If* it’s some variant of a substitution cipher, I’m pretty confident that hill-climbing methods using 2nd-order character stats are sufficient. The tall pole is the lack of open-source machine readable corpora of (at least) candidate 15th century languages (including transcriptions that preserve scribal abbreviations). That may not be as *sexy*, but it’s far more *useful/valuable*.

To the extent that there are crypto and/or linguistic tall poles, they (very probably, IMHO, based on 25+ years of working with the VMS text) relate to (1) extracting word morphology (i.e., potential “verbose” glyph combinations), and (2) automatic solution of ciphers with nulls and homophones. And (1) can probably be addressed with brute force trials and lots of CPU cycles…

While use of the D’Imperio transcription (NB: but maybe *not* the parts of voynich.now upconverted to Currier from FSG) is not an utterly indefensible choice — it agrees with Voyn101 about 95% of the time for non-spaces, and the 1st and 2nd order glyph stats (minus spaces) using Currier’s alphabet are very similar — making the effort to use Voyn101 and the Landini-Zandbergen transcription to create a better combined transcription (which would probably have a whole mess of “space” characters ranging from “full space in all three” to “half-space in two, and full space in the third” to etc.) would improve the reliability of results. Again, low sexy-to-valuable ratio, but high utility as a contribution to the community…

Also, given the likely unreliability of spaces in the transcription, even *if* the spaces are word separators (and Rene/Nick and I will have to agree to genially disagree on that point), it’s probably a good idea to use an analysis technique that ignores spaces.

Just my $0.02’s worth…

Karl
M on February 3, 2018 at 4:50 am said:

On the motivations for this paper, it’s worth noting that NLP publications are mostly at conferences with a regular submission season, and that ACL is generally considered the top conference. I don’t know anything about this group, but given the prior research I think it’s likely that it didn’t start out as an attempt to do serious Voynich research or to get media attention, but rather as a way to get a paper out of in-progress research in time for submission season, and to do so in a catchy enough way to be accepted to ACL. On the other hand, it is disappointing that there isn’t more consideration of transcription and tokenization issues, because that should be pretty basic for NLP tasks.
Mark Knowles on February 3, 2018 at 10:47 am said:

JKP: You say the creator drew on characters from Greek, Astrology and Maths; how are they positionally consistent with the VMS in your terms. If you can draw from those then why not the Hebrew alphabet or the Glatolitic alphabet or others.
D.N.O'Donovan on February 3, 2018 at 12:42 pm said:

Karl,
I hope your texts include some that are less than perfectly grammatical and correctly spelled.

We have no guarantee that the text isn’t a transcription of a translation of the efforts of a non-native speaker to communicate in Language(s) X and perhaps (Y).

If Peter M. will forgive my using a paragraph from his comment (above) as example… how many people would accept this if it was presented as a translation from Voynichese?

“There is a reason in the story where it would justify. Exactly in the time of origin of the VM began persecution of the Jews. Ironically, by this King Albrecht II where the crown carries where in the VM is watching. Encryption Hebrew to hide the origin? ”

We can’t expect idiomatic English to result from a Google translate even if by some chance the correct language had been hit upon (enciphered or otherwise).
nickpelling on February 3, 2018 at 1:11 pm said:

Diane: even when taken together, less-than-perfect grammar and less-than-perfect spelling have vastly insufficient voodoo to bring back to life the kind of simple-minded linguistic readings that seem to so enchant the media (and yet bedevil us). Moreover, even a fairly superficial understanding of the highly structured way that adjacent letters contact each other should be enough to prove to almost anyone that Voynichese cannot be an arbitrarily anagrammed plaintext in the way K & H speculate.

Incidentally, there are two kinds of anagram that replace one kind of internal structure with another: one is the alphagram (as suggested by Gordon Rugg), which I believe words such as “otolal” disprove; while the other is the syllable transposition cipher (which I discussed in The Curse of the Voynich), which it is possible to argue against on the basis of Voynichese’s short average word-length (but whose presence is otherwise hard to prove or disprove).

Here’s a sample syllable transposition cipher, let’s see if anyone can decrypt it:

Nel zome del minca di tranos tavi
mi vaitrori per nau vasel rascuo
che la taridi avi rae tarisma.
Rene Zandbergen on February 3, 2018 at 1:23 pm said:

Dante, Kalliope?
nickpelling on February 3, 2018 at 1:51 pm said:

Rene: it is indeed Canto I of La Divina Commedia. But how long would you have taken had I not first flagged it as a syllabic transposition cipher? ;-p
Peter M on February 3, 2018 at 2:09 pm said:

Diane, yes I forgive you 🙂

and Nick, we’re just kidding because of the one our father.
at least that’s what your text looks like.
D.N.O'Donovan on February 3, 2018 at 2:13 pm said:

Nick,
Sorry – obviously I wasn’t clear. What I meant was that if a process – of attempted decryption OR attempted translation measures success by the perfection of the end-result in terms of grammar in .. English, French, Latin..whatever… then less than perfect results will be thrown out as they have all been for reading nonsensically. My point was that the process of translation can make a valid original sound nonsensical, but in addition to that the original ‘plain text’ may not have been perfectly grammatical in the first place.

Unless tests are run against imperfect sources, the right decryption/translation could be made, but dismissed because… not perfectly idiomatic.

For fun, once, tp see how G/trans coped with Elizabethan English, I transcribed and Google-translated a bit of John Dee’s long letter to Elizabeth re founding a navy. Laughed till I cried at the result.
Rene Zandbergen on February 3, 2018 at 2:42 pm said:

Nick, I actually misunderstood what you meant, but as soon as I got two words, it was clear what was happening (and what you meant), so it went fast.
J.K. Petersen on February 3, 2018 at 3:18 pm said:

The spelling and grammar are not the problem. They are usually proportionally small. Even note-form text with very little grammar can be decrypted.

The problem is that finding a few language-like words or phrases in more than 200 pages of text is not difficult. People have proposed “solutions” in a dozen different languages. Which one is right? It’s possible to find words in nonsense text if there’s enough of it, especially if one introduces anagramming and subjective insertion of vowels.

What is difficult is finding a system of interpretation that works for enough text to *substantiate* whatever method is being used, and that hasn’t been done yet.
J.K. Petersen on February 3, 2018 at 4:04 pm said:

Mark wrote: “JKP: You say the creator drew on characters from Greek, Astrology and Maths; how are they positionally consistent with the VMS in your terms. If you can draw from those then why not the Hebrew alphabet or the Glatolitic alphabet or others.”

No, I said a *small percentage might* be from Greek, astrology and math, and they are consistent with the higher proportion of glyphs that are derived from Latin characters in the sense that the ones that *might* be from Greek are the ones that Latin scribes commonly used. As I said previously, many of the Greek and Latin abbreviations, ligatures, and symbols are indistinguishable from one another because Latin inherited many scribal conventions from Greek.

Also, the glyphs that resemble numbers are consistent with the shapes that were used in Latin in the late 14th and 15th centuries (in the late 15th and 16th centuries, the styles changed). We’re straying rather far if we suggest that less-similar glyphs in Glagolitic or other languages inspired the shapes in the VMS when both the letter-style glyphs and number-style glyphs in the VMS are written as they normally would be in Latin in the early 15th century.

.
The reason I can’t absolutely distinguish whether some are letters or numbers is because some shapes were used for both numbers and letters. For example, EVA-d was used as the letters “s”, “d” and also the number 8. EVA-ell is a completely standard number 4 but sometimes was also used as a letter. EVA-y is the number 9 in Latin (they deliberately drew it as a 9 so it would not be mistaken for the letter “g”) and is one of the most common abbreviations in Latin, used primarily at the ends of words, sometimes at the beginning, and only occasionally in the middle of words (or alone), exactly as it is done in the VMS.

In Latin, the EVA-r shape is also a common abbreviation and is sometimes written like the number 2 (it depends on the scribe). It is sometimes found within words and sometimes stands alone. It has several meanings, depending on context. EVA-ch is also very common in Latin, but it’s a ligature (two letters combined), not an abbreviation, and has several meanings.

.
It’s difficult to know whether there is a distinction between numbers and letters in the VMS, but there is a historical precedence for reading them differently… In early medieval Latin, abbreviations were less common and, in some hands, numbers were expressly used to distinguish abbreviations from the rest of the text. In later medieval Latin, hundreds more abbreviations were added and the letter-number distinction disappeared—abbreviations were distinguished more by context than by numeric derivation.

Expanding abbreviations is integral to reading medieval Latin, so there’s a possibility some VMS glyphs are also intended to be expanded. Whether the shapes derived from numerals or those derived from Latin abbreviation-glyphs are intended to be “processed” in a way different from each other is something that has to be considered.

.
Unfortunately, identifying the source of the glyphs tells us only that the scribes knew Latin scribal conventions. It doesn’t pinpoint the underlying language (assuming there is one, which there might not be). Latin was used to write dozens of languages, and scribal conventions used in England, France or Bohemia, were almost the same as those used thousands of miles away in Naples or Valencia. It’s entirely possible that it’s nonsense text carefully crafted to look like Latin, but hopefully not—there’s enough variation in the text (just barely) to suggest it might be something more.
Mark Knowles on February 3, 2018 at 6:59 pm said:

JKP: When you say “Latin inherited many scribal conventions from Greek”, the Glatolitic alphabet was significantly influenced by Greek and also inherited many scribal conventions.

Obviously I am yet to see your derivation of all the Voynich shapes, but we have already discussed that, so this remains a rather non-specific conversation.
J.K. Petersen on February 3, 2018 at 8:15 pm said:

Glagolitic is not written with the same scribal conventions that are common to Latin/Greek and the VMS. It does not use the same kinds of ligatures, or the same abbreviations, or the swooped tails.

Show me EVA-y at the ends and beginnings of words in Glagolitic. It doesn’t happen, but it is common to both Latin and Voynichese in roughly the same proportions position-wise.

Show me EVA-m in Glagolitic. It can’t be done. The shape doesn’t exist, but it is common in both Latin and Voynichese and is mostly at the ends of words, exactly where one would expect it.

The same can be said for other characters.

Most of the VMS glyphs are based on Latin, but a few of the gallows characters are also found in Greek and, as mentioned, some glyphs might be numbers.
Karl on February 4, 2018 at 12:56 am said:

Diane,

> I hope your texts include some that are less than perfectly
> grammatical and correctly spelled.

A corpus that avoids obvious sampling problems (i.e., don’t use a collection of letters by a single author) would probably capture the effects of those on letter statistics. (Although a flawed sample is still probably better than no sample at all…) *If* it’s a cipher, I doubt the key changes often enough to need more than letter pair frequency information to crack it. Spelling and grammar variability may make _translating_ it harder…

> We have no guarantee that the text isn’t a transcription of a
> translation of the efforts of a non-native speaker to communicate
> in Language(s) X and perhaps (Y).

You are absolutely right that we don’t. While I don’t want to be like the drunk who loses his keys in the alley and when asked why he’s looking for them under a lampost replies, “Because the light is so much better here,” it makes sense (at least to me) to try to test (and potentially eliminate) the easier-to-handle cases first.

Karl
Rene Zandbergen on February 4, 2018 at 8:26 am said:

Hi Karl,

the only thing I am very sure about with respect to word spaces is, that the existing transcription files are all full of errors in this respect.
The spaces are clearly there in the MS, but what they delimit is not yet certain, in my opinion.
Aziz Bounouara on February 4, 2018 at 9:07 am said:

After analizing the first sentence from manuscript and compare the structure and meaning whether in hebrew or arabic or others , I found that the sentence given has no match or any relation to the meaning. it is true that their sentence has similarities with other sentences in VM. if realy apply on the right sentence, then they are on right way …
J.K. Petersen on February 4, 2018 at 10:41 am said:

I’ve produced several transcripts (after the first, which I created in 2008, I realized there were subtleties in the VMS that I had missed the first time and also noted, with significant disappointment, that the concordance relationships one would expect to find did not reveal themselves). It took more than a year after finishing the first transcript to produce the concordance (it’s more than a thousand pages) and I color-coded it to be absolutely sure I hadn’t missed anything, but the result was disappointing (at least from the perspective of meaningful text). Even considering differences between Currier A and B, something doesn’t add up—word groupings or even single words that should reference each other in certain ways do not.

This led to a lot of soul-searching, staring at the manuscript, and painstaking creation of two more transcripts (a significant chore—it’s never fun to tread a long tedious path numerous times). The second time through, I also adjusted the glyphs so they could be searched either as single characters or as ligatures.

I am pretty convinced that there are spaces and half-spaces, they seem to occur in reasonably consistent patterns, but I haven’t attacked the most recent transcript from the referential perspective, to see if relationships among the tokens make more sense than they did in the first transcript (which didn’t acknowledge half-spaces).

Even if the glyphs have been transposed and possibly sorted to create the positional characteristics of Voynichese, even if the text were numbers rather than alphabetic characters (or a number of other scnarios), even if there are biglyphs (which I’m pretty convinced there are), this shouldn’t have fully obscured concordance patterns, so I’ll have another go at it when I have time and if it is as pattern-resistant as before, I’ll move on to Plan F.
nickpelling on February 4, 2018 at 12:14 pm said:

J.K. Petersen: the lack of “concordance relationships” is something every genuine Voynich researcher struggles with. To me, it suggests that there may be some kind of per-page (or even per-paragraph) local transformation going on… which would be extremely awkward. 🙁
D.N.O'Donovan on February 4, 2018 at 2:08 pm said:

Karl,
Thanks for taking the questions seriously and for such a helpful response.
D.N.O'Donovan on February 4, 2018 at 2:55 pm said:

Karl,
Just for interest’s sake. If you can suspend disbelief… a surprising number of the botanical folios show plants used in dyeing, and here and there is a mnemonic alluding to weaving, or net-making and so on.

I’d also been amused to notice how easily the Vms seemed to blend into any kind of abbreviated technical text: I tried about half a dozen different sorts as idle amusement and found especially interesting that if read as (don’t laugh) knitting instructions they maintained the necessary symmetry, and ‘kept the rules’: you can’t have a hole in the first place, and if you double any stitches on one side of a pattern -unit , an equal number must be doubled on the other AND an equal number removed. I wasn’t equating glyphs with stitches, but reading them as if they were a pattern of that sort. (Don Hoffman had done something a bit similar, and it worked, but the difference was that he believed it was a decoding).

Then – Julian Bunn turned pages of the text into colour-coded print-outs and I was stunned to see that again the patterns were partly ‘readable’ in terms of regularly patterned fabric.

By that stage I thought things had gone quite far enough, so I approached a specialist in textile analysis (VERY technical and scarce sort of person), asking if she’d take a look at four or five of Julian’s scans and just say whether any, all, or none, were weave-able. Just yes or no, on technical grounds.

It was all going well until I had to mention which ms we were talking about.

Well, from the way she dropped the sheets you’d think they’d turned to red-hot iron.

So it remains an idea unexplored – though Julian doesn’t believe the texts is formed of words at all.

Thought the story might amuse you. 🙂 And Nick’s readers.
J.K. Petersen on February 4, 2018 at 11:16 pm said:

It’s not an unexplored idea. It’s simply an idea that hasn’t been explored to the point of definite conclusions. What is written about the VMS is only the tip of the iceberg. The rest of the iceberg is researchers quietly doing their work, unknown to the rest of us, until they have something to announce.

Knitting patterns, crochet patterns, weaving patterns, they’ve all been suggested—these ideas are not new (in fact there’s a video on youtube interpreting the text as crochet patterns in quite a rational way).

Quite a few years ago I took my first copy of the transcript and ran it through a search-and-replace routine that turned each glyph into a specific color to see if there were steganographic patterns that might be textile patterns, or word patterns, or arrows, or maps, or hidden drawings, or text-within-text, and wasn’t able to find anything conclusive enough to write it up (at least not yet). I also looked into the possibility that the text itself was describing patterns (in fact, this is still ongoing), which led me to one of the ideas I’ve been working on…

…that Johannis Gutenberg might be behind Voynichese.

Gutenberg was very secretive about the development of his printing press because he knew its economic potential would be recognized by eager entrepreneurs. He was collecting manuscripts with print potential while getting ready to launch his revolutionary business, so it’s possible the VMS came into his hands in one way or another, along with other books with print potential (the Gutenberg Bible being the best-known). In fact, after the moneylender and Peter Schoeffer took all of Gutenberg’s assets, Schoeffer did print a book on herbs, possibly from references located by Gutenberg.

So I asked myself, what if Gutenberg hid the dimensions for his typefaces within the VMS text during the planning stages? Perhaps it was an unfinished manuscript with pictures and no text. He virtually disappeared from public view for a few years while studying and designing type (and developing the process for casting the type) and was training a couple of assistants to be type-casters and printers (Schoeffer being one of those assistants).

If Gutenberg were designing and recording something that can be expressed in X, Y coordinates or X, Y, Z coordinates, or other measurable quantities (like blocks of type, or the molds for casting the type, or even the design of the press itself, or the formulas for the inks, and other aspects of the process), then many of the peculiarities of the VMS text are potentially explained, including

1) the high level of repetition,
2) the high incidence of glyph pairs,
3) the orderliness and positional rigidity of the glyphs,
4) the possible integration of letters and numbers,
5) the lack of concordance relationships that one would expect if it were narrative text or encoded prose,
6) the resistance to computational attacks based on natural language assumptions, and
7) the presence of more than one hand (possibly the assistants).

Gutenberg was ingenious, painstaking, and methodical—someone capable of devising a system for describing physical information. If Voynichese is encoded dimensions and formulae, he might further have added common Latin abbreviation symbols (maybe as null characters) to make it superficially look like Latin (and to hide the nature of the real content from prying eyes).

Even if it’s not Gutenberg who devised the text, someone with something to hide of a physical and formulaic nature might create a script that looks like Voynichese—approaching the text from this perspective might be more fruitful than assuming it is enciphered natural language.
john sanders on February 4, 2018 at 11:19 pm said:

D.N.O: Reminds me of Lily Voynich’s mum Mary Boole and her celebrated curve stitching as an aid to teaching math. Of course Mary’s dad George had set the ball rolling with his Boolean Algebra revolution and as we know, his logic systems are even now prefered to others in many modern computàtion fields. I seem to remember in Australia’s Wyndam Sceme of the early sixties, both Boole systems were taught quite extensively in schools, though what happened to them, I know not.
Karl on February 5, 2018 at 1:22 am said:

Some semi-random thoughts:

Re: spaces and half-spaces, I had *hoped* the half-spaces in Bio B in the Voyn101 transcription might provide some insight into the verbose glyph combos (if that is, in fact, what is going on) — maybe they were unconscious breaks after writing glyph combos — but that didn’t seem to provide much insight.

I would vigorously encourage folks looking at the text to revisit this page of Stolfi’s:

http://www.ic.unicamp.br/~stolfi/voynich/98-01-27-plant-names/

I don’t agree with all his proposed glyph substitutions/equivalences, but I’d bet money some of them are correct.

Re: parsing issues with the Voynich text: Moore’s Law is our friend. Enumerate possibilities and test them all. That’s what shell scripts are for. Even if you have to invent DecryptVoynichAtHome (based on SETIAtHome) to handle the combinatorics. How many filament materials did Edison(‘s lab grunts) grind through to find one that was practical?

Nick, thinking about it, I think your proposal of looking for a set of single- and verbose-glyph combos that creates a Bio B text with similar stats to (i.e. “maps to”) Herbal A could be an interesting new approach to resolving the “word” morphology problem. (Again, conditioned on the verbose cipher theory being correct…)

JKP (on letter/numeral ambiguity in the glyph sources): Brumbaugh ran aground on those shoals — he interpreted Currier ‘4’ as the Arabic digit ‘4’ (consistent with his assumed post-Columbian dating), but Currier ‘E’ is the 15th-Century form of that Arabic digit IIRC.

Diane, I have no trouble buying your position (if I understand it correctly) that the herbal drawings involve an idiosyncratic personal convention for representing the plants (and other than a small number of examples like the ivy, don’t fall into any of the main families of European herbal mss traditions [or Arabic/Eastern herbal mss? — I don’t recall your early work well enough to be sure on that point]. Re: the apparent frequency of dye-and fabric-related plants, I’m not sure I hold out much hope for any cribs from plant IDs or uses providing much leverage, but absent clear evidence that a theory/approach is pointless I’d never discourage folks from chasing their ideas of choice.

I haven’t read the paper in detail yet — how did they handle the mismatch between the number of Hebrew (or Latin) letters and Voynich glyphs (at least in Currier)?

Karl
D.N.O'Donovan on February 5, 2018 at 8:42 am said:

Karl,
Not to hog the space, but what I found was the opposite of a personal and idiosyncratic attitude. No show-ponies.
D.N.O'Donovan on February 5, 2018 at 8:46 am said:

John.
Thanks for the reference. And isn’t g/gle wonderful . It turned up this:
Mary Boole and her celebrated curve stitching as an aid to teaching math.
https://www.ncbi.nlm.nih.gov/pubmed/15036927

I remember that sort of thing being made with thread, nails and 3-ply by other children who did handcrafts rather than art-history, but always supposed the dust-catchers were vaguely anthropological. Good to know better.
john sanders on February 5, 2018 at 9:09 am said:

Picked up the deliberate mistake, just to see if you were awake. Of course George Boole was Mary’s husband and daughter Ethel Lilian (Voynich) was born just before he died so It seems that her own talents were mostly self taught, as were both her parents, so obviously the genes have it. It might pay any Voynichers unfamiliar with this amazing family to check into their respective multi gifted academic backgrounds. It might prove insightive into individual quests for answers re language formats and organization of words with the help of a variety of unusual, though most effective Victorian tools.
Karl on February 5, 2018 at 6:33 pm said:

Diane, my bad — I’ll have to go back and reread some of the stuff I’d saved from your website. Looking at your flowchart post, I see you say, “Using the form of leaf, and a plant’s habit, as the basis for classification reflects a style attested in Mediterranean works from the time of Theophrastus, whose works I take as a defining corpus here,” but I was under the impression that things like the circumscription mark for cultivated plants was a Voynich-author(s)-specific code.

BTW, there is yet another new solution. The PDF is protected so you can’t cut and paste to translate, but there is a Google cache page for the first ~20 pages that you can feed to Google translate:

http://webcache.googleusercontent.com/search?q=cache:uatXEngGPFcJ:www.voynichov-rukopis.sk/data/voynichov_rukopis.pdf+&cd=1&hl=en&ct=clnk&gl=us&client=firefox-b-1

Karl
nickpelling on February 5, 2018 at 7:26 pm said:

Karl: I tried to make sense of that late last year, but had to admit defeat. There’s a letter decryption table in an early page, if anyone wants to try (be my guest, knock yourself out, etc). :-/
Champolione on February 5, 2018 at 9:41 pm said:

Nick. I looked at the PDF document. It’s a colossal crap.
That’s why the author writes there. That the PDF text can only be expanded with its consent. 🙂 Such stupidity and only with his consent. It’s a weird ant. It can only spread out of troble.
J.K. Petersen on February 6, 2018 at 3:01 am said:

I’ve only glanced through it, I haven’t tried to read it yet (and maybe never will). Much of it appears to be based on a lookup table.

I suppose I should analyze a few sentences to see if there’s logic behind the assignments of characters and lookup words, but it’s hard to get motivated when some of the more repetitious text in the VMS appears to have been interpreted as moderately complex and varied prose.
Rene Zandbergen on February 6, 2018 at 5:57 am said:

John Sanders:

Kennedy, Gerry: The Booles & The Hintons, two dynasties that helped shape the modern world, Atrium, 2016.

This is the same Gerry Kennedy who co-authored: “The Voynich manuscript; the mysterious code that has defied interpretation for centuries, 2004, 2006”.

This book has numerous interesting details about Wilfrid as well, for example how he managed to travel from Siberia to London in less time than any ship would take him.
john sanders on February 6, 2018 at 8:35 am said:

Rene: Yes, I do recall the magic carpet flitting about that people like Wilftred and other fellow revolutionaries and their clandestine fellow travellers were able to achieve in those heady days of yore; seemingly being in two far distant places at the same time. Lily Boole’s alleged one-time lover, Sidney Reilly (Gadfly/Ace of Spies) seemed able to be in Tokio (sic) delivering Russian military plans to the Emperor, then varifiably dining with the Director General of Deuxieme Bureau in Paris within the same week in 1904. Sure makes around the world by balloon in eighty days seem tame by comparison; or perhaps the Wright bros. were already secretly promoting direct international air travel, a year after their Kittyhawk inaugural test run with ‘The Flyer’.
D.N.O'Donovan on February 6, 2018 at 11:52 am said:

Karl,
Thank you. Can’t read a word of it, but so very pretty. 🙂 I gather this is the ‘Old Czech’ version? Must see if I have any old Czech’s in the address book.
Champolione on February 6, 2018 at 9:04 pm said:

Termite. PDF document is it not written in Czech language. It is not written even Old Czech. The documentary was written by an ant who knows nothing.
Thank you. Champolione. 🙂

PDF document it is written in Slovakia ( Slovakia ant written in PDF document ).
🙂
Drabkikker on February 8, 2018 at 4:28 pm said:

@ Champolione: I couldn’t help but read an ant who knows nothing in this guy‘s voice. 😉
Jeff Haley on February 24, 2018 at 3:46 pm said:

Ok so Nick wrote a book. Get over it. The carbon dating lends credence to a connection with Milanese cipher. The time period fits. The imagery in the manuscript has definite connections to renaissance Italy. And hey that’s just from memory. I did study the thing for a few years.
Mark Knowles on February 24, 2018 at 4:34 pm said:

Nick: Don’t you think there will be a slight sadness once the Voynich is translated that such a tantalising puzxle is no more. I think there will be as there are not so many puzzles like it.

I can think of other unsolved historical puzzles, but there is something, in its own way unique, about the Voynich.

The following puzzles remain and intrigue me:

Linear A is as yet undeciphered
David Rohl’s ideas about the geographical location which may have been the place loosely described in the Bible as the Garden of Eden
And others…

The question “Who were the Sea People?” that I mentioned before was recently answered.

One thing that makes the Voynich an attractive problem is that it appears to be fundamentally soluble. The question as to what was the inspiration for the Garden of Eden story may never be answered as there may be not enough evidence surviving. The same with questions like “Who was the real King Arthur?” However the Voynich lies in a completely different category of problem.

Part of the reason I didn’t reject the idea of researching the Voynich is that it seemed a geniune historical puzzle, verified as such by the carbon dating, and there seemed to be enough evidence for it to be classified as solvable.