Voynich researchers without a significant maths grounding are often intimidated by the concept of entropy. But all it is is an aggregate measure of how [in]effectively you can predict the next token in a sequence, given a preceding context of a certain size. The more predictable tokens are (on average), the smaller the entropy: the more **un**predictable they are, the larger the entropy.

For example, if the first order (i.e. no context at all) entropy measurement of a certain text was 3.0 bits, then it would have almost exactly the same average information content-ness per character as a random series of eight different digits (e.g. 1-8). This is because entropy is a log2 value, and log2(8) = 3. (Of course, what is usually the case is that some letters are more frequent than others: but entropy is the bottom line figure averaged out over the whole text you’re interested in.)

And the same goes for second order entropy, with the only difference being that because we always know there what the preceding letter or token was, we can make a more effective guess as to what the next letter or token will be. For example, if we know the previous English letter was ‘q’, then there is a very high chance that the next letter will be ‘u’, and a far lower chance that the next letter will be, say, ‘k’. (Unless it just happens to be a text about the current Mayor of London with all the spaces removed.)

And so it should proceed beyond that: the longer the preceding context, the more effectively you should be to predict the next letter, and so the lower the entropy value.

As always, there are practical difficulties to consider (e.g. what to do across page boundaries, how to handle free-standing labels, whether to filter out key-like sequences, etc) in order to normalize the sequence you’re working with, but that’s basically as far as you can go with the concept of entropy without having to define the maths behind it a little more formally.

## Voynich Entropy

However, even a moment’s thought should be sufficient to throw up the flaw in using entropy as a mathematical torch to try to cast light on the Voynich Manuscript’s “Voynichese” text… that because we don’t yet know what makes up a single token, we don’t know whether or not the entropy values we get are telling us anything interesting.

EVA transcriptions are closer to stroke based than to glyph based: so it makes little (or indeed no) sense to calculate entropy values for EVA. And as for people who claim to be able to read EVA off the page as, say, mirrored Hebrew… I don’t think so. :-/

But what is the correct mapping or grouping for EVA, i.e. the set of rules you should apply to EVA to turn it into the set of tokens that will give us genuine results? Nobody knows. And, oddly, nobody seems to be even asking any more. Which doesn’t bode well.

All the same, entropy does sometimes yield us interesting glimpses inside the Voynichese engine. For example, looking at the Currier A pages only in the Takahashi transcription and using ch/sh/cth/ckh/cfh/cph as tokens (which is a pretty basic glyphifying starting point), you get [“h1” = first order entropy, “h2” = second order entropy]:

63667 input tokens, 56222 output tokens, h1 = 4.95, h2 = 4.03

This has a first order information content of 56222 x 4.95 = 278299 bits, and a second order information content of (56222-1) x 4.03 = 226571 bits.

If you then also replace all the occurrences of ain/aiin/aiiin/oin/oiin/oiiin with their own tokens, you get:

63667 input tokens, 51562 output tokens, h1 = 5.21, h2 = 4.01

This has a first order information content of 51562 x 5.21 = 268638 bits, and a second order information content of (51562-1) x 4.01 = 206760 bits. What is interesting here is that even though the h1 value increases a fair bit (as you’d expect from extending the post-parsed alphabet with additional tokens), the h2 value **decreases** very slightly, which I find a bit surprising.

And if, continuing in this vein, you also convert air/aiir/aiiir/sain/saiin/saiiin/dain/daiin/daiiin to glyphs, you get:

63667 input tokens, 50387 output tokens, h1 = 5.49, h2 = 4.04

This has a first order information content of 50387 x 5.49 = 276625 bits, and a second order information content of (50387-1) x 4.04 = 203559 bits. Again what I find interesting is that once again the h1 value increases a fair bit, but the h2 value barely moves.

And so it does seem to me that Voynich entropy may yet prove to be a useful tool in determining what is going on with all the different possible parsings. For example, I do wonder if there might be a practical way of exhaustively / hillclimbingly determining the particular parsing / grouping that maximises the post-parsed h1:h2 ratio for Voynichese. I don’t believe anyone has yet succeeded in doing this, so there may be plenty of room for good new work here – just a thought! 🙂

## Voynich Parsing

To me, the confounding beauty of Voynichese is that all the while we cannot even parse it into tokens, the vast modern cryptological toolbox normally at our disposal does us no good.

Even so, it’s obvious (I think) that ch and sh are both tokens: this is largely because EVA was designed to be able to cope with strikethrough gallows characters (e.g. cth, ckh etc) without multiplying the number of glyphs excessively.

However, if you ask whether or not qo, ee, eee, ii, iii, dy, etc should be treated as tokens, you’ll get a wide range of responses. And as for ar, or, al, ol, am etc, you won’t get a typical linguistic researcher to throw away their precious vowel to gain a token, but it wouldn’t surprise me if they were wrong there.

## The Language Gap

The Voynich Manuscript throws into sharp relief a shortcoming of our statistical toolbox: specifically, its excessive reliance on our having previously modelled the text stream accurately and reliably.

But if the first giant hurdle we face is parsing it, what kind of conceptual or technical tools should we be using to do this? And on an even more basic level, what kind of language should we as researchers use to try to collaborate on toppling this first statue? As problems go, this is a precursor both to cryptology and to linguistic analysis.

As far as cipher people and linguist people go: in general, both groups usually assume (wrongly) that all the heavy lifting has been done by the time they get a transcription in their hands. But I think there is ample reason to conclude that we’re not yet in the cinema, but are still stuck in the foyer, all the while there is a world of difference between a stroke transcription and a parsed transcription that few seem comfortable to acknowledge.

Hey Nick,

First order entropy is calculated from unigram frequencies and second order entropy from bigram frequencies, right?

I found and glanced through this article: http://ixoloxi.com/voynich/mbpaper.htm

“A verbose cipher, one which substitutes several ciphertext characters for one plaintext character, can produce the entropy profile of Voynich text.”

Do you find that a worthwhile hypothesis to explore?

Jarlve: unfortunately, people calculate second order entropy in many different (and often bad) ways. In my opinion, the correct way to calculate it is to calculate the first order entropy of each of the individual contexts, and then get a weighted sum of those values.

As far as verbose cipher goes, I wrote extensively about it back in 2006 (“The Curse of the Voynich”, Compelling Press), but it’s something that hasn’t really been picked up by anyone else. Which is a shame, but there you go. 😐

Nick,

all in all, the second entropy exercise leads (by some) to the conclusion that the VM is written in natural language. Well, the correct conclusion would be that the VM is written in some organized language, natural or artificial one / that was created along the lines and rules of natural languages). After all, so far all trials of cracking based on known natural languages were futile (except for those “meta theories”, as you call them 🙂

I have even read some criticism that using second entropy (which is mainly the DATA test) for TEXT evaluation is not fully justified. That is claimed not only by mathematicians but especially by language experts.

Another assumption based on second entropy test is that the VM cannot be written in TRANSPOSITION cipher, since it would be too much disorganized. Well, how much is too much? After all, there are different transpositions, some even better organized say transpositions of whole words. And where do we put Cardan grille for instance?How well would e that one organized? It just may be that we prematurely threw the baby with the bathwater . . .

Yes that’s the idea which I’m also fond of – to try to empirically find a parsing which would “normalize” the Voynich entropies. I’m not sure if that can be achieved, but I think it’s worth trying.

Btw, what tool do you use to calculate 2nd order entropy?

As a sidenote, why do you use index 0 instead of 1 for the 1st order and index 1 instead of 2 for the 2nd order? Usually index 0 is used to denote the so-called “0-th order entropy” which is just the theoretical maximum of the 1nd order entropy (logN).

Jan: I think you can get meaningful results from entropy calculations, but you have to be careful about how you pre-process the data you feed in. For example, it would probably be advisable to stick to Currier A or B, and to filter out label sections.

The normal caveat about statistical tests (that most go wrong before any calculation is done) apply here. 🙂

Anton: I normally write it by hand. If you can do a simple entropy calculation on an array of instance counts, use the same routine on each context’s instance count array.

The easiest route to the finish line then is to sum up the total information content of each context, and then divide by (total length – 1).

I’ve always used h0 for context-less entropy, but – looking around the web – opinion clearly differ on this. I’m not sure if there’s a definitive answer to this, sorry. 🙁

Nick:

I’ve been using this online calculator for 1st order: http://planetcalc.com/2476/

But I can’t find anything user-friendly for higher orders, and to my surprise I even failed to find any Matlab package supporting this kind of calculation.

About the indexes, both h1 and h0 are context-less; h0 is actually just the special case of h1 when all characters have equal probabilities to occur (say, have equal frequency counts in the analyzed text). In that particular case entropy depends only on the number of characters in the alphabet (logN). If that’s not the case, i.e. some characters are more or less frequent than others, context-less entropy (h1) will be lower than logN.

Anton: I’ll update the text accordingly, thanks for that clarification. 🙂

In general, though I’m extremely comfortable with simple entropy calculation, I’m far from convinced that it gives a like-for-like comparison with higher order entropy values. For example, if a two-character context only appears once in a text, the information content of the character following it is zero. Except it isn’t. Except it is. :-/

But all the same, it’s a billion times less vague than Kolmogorov Complexity. It’s just a shame there isn’t something fractionally better than ‘pure’ higher order entropy values. 🙁

About the (estimation of) information content of a character following a bigram that occurs only once in the text – technically you are right, but this is not a matter of entropy. Entropy (3rd order in this case) as the characteristic of the information source is an averaging one – it deals not with individual characters, but with the source on the whole. It’s not about “what is the quantity of information conveyed by the particular character X (provided that we know that the preceding two are Y and Z)”, but about “what is mean quantity of information conveyed by a character (provided that we know the preceding two, whatever they are)”.

What is not directly comparable, are entropies (no matter the order) between sources using alphabets of a different size. There is really little sense in comparing absolute entropy values of a text using a 20 character alphabet with those of a text using a 60 character alphabet. This is often forgotten in discussions about entropy (Voynich in particular).

Anton: I guess you’re slightly missing my point here, which is that the reason people compare different order entropies is that they want to compare not the relative predictabilities, but the relative constructabilities – how much information is there in this text without using any contextual cues, and how much information is there if we can make use of second order contexts etc.

Even though nth order entropies are useful values to have, they’re still very largely independent of each other (mathematically speaking). By which I mean: even though they’re all measured in bits, they don’t normalize to each other in the way you might hope.

Still not sure what’s your point :-/, but anyway 🙂

I can say that, in regard to Voynich entropies, I see three problems of many a discussion of that kind.

First (as I noted), that it is not absolute entopy values that should be compared, but rather differential values (i.e. the extent to which the absolute entropy value is close to the possible maximum). (This is often neglected, but not always).

Second, that if ones makes a comparison, then the like should be compared with the like. I mean if the VMS is a highly conspected text (which the density of the label vords in the folio space suggests), then it’s little sense to compare it with King James’s Bible or the Origin of the Species. Same thing, of course, for the time range and locale.

Third (actually the foremost) – that’s what your post about – the problem of parsing. If we don’t know the real Voynichese alphabet, then how can we be sure in Voynichese entropies, to begin with?

Anton: I’m talking about the problems of comparing different order entropies of the same text, e.g. forming an h1:h2 ratio (as per Bennett’s book etc). Because that’s about the only tool we have to compare different possible parsing schemes for Voynichese. 🙂

OK, now I see.

But the relation between h1 and h2 is indicative (as well as between h0 and h1, or between h2 and h3). The reason is that h(N) sets the upper limit for h(N+1), so looking at how close is h(N+1) to h(N) can be suggestive for comparison purposes. One can look at the ratio (as Bennett does) or at the difference (as does Stallings), but the idea is the same.

Anton: it’s indicative, but not in the way people hope. The idea people want to capture is what proportion of the information ‘shifts’ when you move the contextual frame from one level to the other. But nth order entropy calculations don’t quite give you that… it’s close, but then again, so’s Glenn. 🙂

Nick. 12 year of you at the beginning. It is bad.

Do you need some help ?

Anton Alipov, he writes well. What is important is the correct alphabeth. Etropy will not help. 🙂

Nick. What are you afraid of ? That will not be the first one to find out what’s handwriting ? Well, not the first. First I am. But I can move your research forward. And that’s good. Or not ? Without my help on the manuscript’ll work until the end of life. And I can guarantee that without success. 🙂

I said the 3 of them will pass away and they will not know about the VM . it is because they scared someone find out before them ? what a VM race !!!!