Voynich Paper Suggestion #3: LIBNPIG Statistics

It’s well-known that the distribution of Voynichese page-initial (and indeed paragraph-initial) glyphs is, unlike the rest of the text, strongly dominated by gallows characters. But what is less widely known is that something really fishy is going on with the distribution of all the other line-initial glyphs too.

As far as I know, nobody has yet given this behaviour the in-depth attention it properly deserves, which is why I think it would make a good subject for a paper. Though it perhaps needs a catchier name than “Line-Initial But Not Paragraph-Initial Glyph” (LIBNPIG) statistics (so please feel free to come up with a better name or acronym).

Though you might reasonably ask: isn’t this just another side of the whole constellation of LAAFU (“line as a functional unit”) behaviours?

Well, yes and no. “LAAFU” is a shorthand mainly used by some Voynich researchers to signal their despair at the unknowableness of why certain glyphs seem to ‘prefer’ different positions within a line. So yes, LIBNPIG behaviour is a kind of LAAFU behaviour: but no, that doesn’t mean it can’t be understood. (Or at least carefully quantified and tortured on a statistical rack.)

LIBNPIG Observations

How do we know that something funky is going on with LIBNPIGs?

LIBNPIG ‘tells’ are perhaps most visible in Q20 (Quire #20). For example, even though EVA daiin is common in Currier A pages (you may recall that it’s one of the ‘Big Three’ A-words – daiin / chol / chor), it’s far less common in Currier B pages: however, when it does occur in Q20, it is frequently in a LIBNPIG position. In fact, this is true of all word-initial EVA d- words in Q20, which you can see here (scroll to the bottom).

Similarly, if you look at EVA s- words (ignoring sh- words, which is a particularly annoying EVA artifact, *sigh*) in Q20, you should also see that these appear far more often line-initially than they should.

Is that all? No. The same is true of EVA y- words in Q20 too, but this pattern is additionally true in Herbal B pages. Note that this also seems to be true of some Herbal A pages, but EVA y- words in Herbal A appear to work quite differently to my eyes. (Though I’d advise looking for yourself, & form your own opinion.)

Curiously, even though paragraph-initial words so strongly favour gallows characters, LIBNPIG words seem to abhor gallows characters, a behaviour which is in itself quite suggestive and/or mysterious.

Conversely, if you go looking for LIBNPIG EVA ch- and sh- words, I believe you’re far more likely to instead find them clustering at the second word on a line. Note that Emma May Smith (with Marco Ponzi) looked at this back in 2017, though more from a word-based perspective (even though the first two words on a line in Q20 are often fairly odd-looking). The concern for me is more that these behaviours mean that Voynich word dictionaries (and indeed all word analyses) based on line-initial words are unreliable.

So, what is going on in Q20 (in particular) that is making LIBNPIG words prefer d- / s- / y- so much? I guess this really is the starting point of the paper I’m suggesting here.

Vertical keys?

The notion that the first column of glyphs might have some kind of special meaning is far from new. In fact, there is evidence suggesting this in the manuscript itself on page f66r, where you can clearly see a column of glyphs (though admittedly there is also a column of freestanding words to its left). This is a curious item to find in a manuscript.

But might all (or, at least, many) pages of Voynichese text contain vertical keys inserted as a single line-initial glyph at the start of lines? Philip Neal speculated about this possibility many years ago, causing me to (occasionally) refer to these as “vertical Neal keys”. A vertical key might conceivably be used for many things, such as inserting an (enciphered) page title, or even a folio number or page number: though it’s easy to argue that the relatively narrow range of glyphs we see appearing here probably rule this out.

In “The Curse of the Voynich” (2006), I speculated instead that a glyph inserted at the start of a line might form part of some kind of transposition cipher. The suggestion there was that a second glyph (say, a k-gallows) might act as a token to use the glyph (or some function of that glyph) inserted at the start of the same line. This would be a fairly simple crypto ‘hack’ that would make codebreakers’ jobs difficult.

There are many other possible accounts one can devise. For example, it’s possible that the first glyph on a non-paragraph-initial might function as a kind of catchword, to link the end of one line with the start of the next. Alternatively, it might be telling the reader how to join the text at the end of the preceding line with the text at the start of the current line. Or it might have some kind of crypto token function (e.g. selecting a dictionary). Or it might be a numbering scheme. Or it might be a marker for some funky line transposition scheme. Or a null. Or… one of a hundred other things (if not more).

If all these speculations seem somewhat ungrounded, it’s almost certainly because the basic groundwork to build a sensible discussion of LIBNPIG behaviour upon hasn’t yet been done. Which is your job. 🙂

LIBNPIG Groundwork

What needs doing? For a start, you’d need to build up a solid statistical comparison of paragraph-initial glyphs and LIBNPIG glyphs, along with paragraph-second glyphs and LSBNPS (line-second-but-not-paragraph-second) glyphs, for paragraph text in each of Herbal A, Herbal B, Q13 and Q20 (I would suggest).

With those results in hand, there are some basic hypotheses you might want to try testing:

Is there any statistical correlation between a LIBNPIG glyph and the glyph immediately following it? Oddly, it seems that nobody has yet tried to test this – yet if there isn’t (as visually seems to be the case), then I think it’s safe to say that something is provably wrong with all naive text readings.
Is there a correlation between a LIBNPIG glyph and the previous line’s end-glyph?
Is there a correlation between a LIBNPIG glyph and the following word’s start glyph?
Do paragraph-initial second words behave the same way as LIBNPIG second words?
Might LIBNPIG glyphs simply be nulls? Might they be chosen just to look nice? Or do they have some genuinely meaningful content?
How does all this work for paragraph text in each of the major sections of the Voynich Manuscript? e.g. Herbal A, Herbal B, Q13, Q20
(I’m sure you can devise plenty of your own hypotheses here!)

Ultimately, what we would like to know is what LIBNPIG behaviours tell us about how the start of Voynichese lines have to be parsed – for if there is no statistical correlation between a line-initial glyph and the glyph following it, this cannot be a language behaviour.

Even though we can all see numerous LAAFU behaviours, it seems that few Voynich researchers have yet accepted them solidly enough to affect the way they actually think about Voynichese. But perhaps it is time that this changed: and perhaps LIBNPIG will be the thing that causes them to change how they think.

17 thoughts on “Voynich Paper Suggestion #3: LIBNPIG Statistics”

Koen Gheuens on April 4, 2022 at 8:54 am said:

Are you planning to submit a paper yourself, Nick?

Something I would love to see you bring more attention to through the conference is the question “how to parse EVA”. I tried to work a bit towards this in my “Entropy hunting” series, (which is where my inspiration ended) though this was far from sufficient.

The unfortunate situation is that you are one of the few researchers who actually seems to care about this issue. Others are perfectly content to use unmodified EVA, which can seriously mess up the results. (Unless the analysis is at word-level, in which case glyph parsing doesn’t matter).

Of course if you already have a topic in mind you want to write about yourself, I”d be looking forward to your submission as well.
nickpelling on April 4, 2022 at 12:46 pm said:

Koen Gheuens: I’m not sure if I’ll submit a proposal – I’m just writing up these suggestions as a way of making visible what I consider to be the most important gaps in our Voynich knowledge (that we could fill).

Parsing is a huge topic (as you know), but the issue there is about finding the right questions to ask, i.e. what tests will help us tell us how to parse even small parts of Voynichese.

Here, the idea of raising LIBNPIG as a topic is that there are many aspects of Voynichese behaviour (mainly the LAAFU stuff, but there are others) that are interfering with our ability to even get as far as parsing. Pre-parsing, if you will.

I suppose I just want people to be looking at the right kinds of questions, not wasting their time on tertiary stuff.
xplor on April 4, 2022 at 3:08 pm said:

What was the reason it was written in code ?

The original was in a language or culture they didn’t understand.

The time was between 1232 and 1820 when the Catholic Church used
torture and other unkind means to try to identify religious heresy.
D.N.O'Donovan on April 4, 2022 at 3:13 pm said:

Nick, I’ve got so many questions about the chain of interpretations from the actual text to the transcription stage to EVA that I came to ask them here but now realise there are too many for a comment. Guess I’ll just have to put it in a post and hope for informed responses.

Good to hear that Voynich people still ask research questions. I was beginning to feel it must have gone out of fashion, superseded by an hubristic certainty.
nickpelling on April 4, 2022 at 4:55 pm said:

Diane: these paper suggestions are just a nice way to highlight some hugely fundamental things that researchers normally either don’t notice at all (if so, shame on them) or freeze in terror at the sight of (because they assume that nobody could ever resolve them). Of course, both responses are basically irrational and wrong-headed. 😉
Agasul on April 4, 2022 at 5:10 pm said:

Sentences that always begin with the same character are usually questions. Start the sentence with a question.
What, who, where, how, why, because of……as a prefix because,
because of, therefore, therefore, therefore, there, this…..
Sometimes it can just be simple.
Agasul on April 4, 2022 at 5:10 pm said:

Was, wer, wo, wie weshalb, warum, wegen……als Anmtwort weil,
wegen, darum, dehalb, deswegen, dort, dieses…..
Manchmal kann es auch nur einfach sein.
JULIAN J BUNN on April 4, 2022 at 5:16 pm said:

Very interesting, Nick – I wasn’t aware of this odd feature of non paragraph initial glyph distributions. Isn’t one simple explanation that they are not capital letters because they are not at the start of a sentence/paragraph, that gallows work like capital letters?
Agasul on April 4, 2022 at 5:27 pm said:

Maybe it’s easier than you think.
Maybe the “4” is not the same as “q”. Maybe a round corner in the sign is X and a square one is Y.
You have to ask yourself how it is possible that 5 different hands have the same writing tolerance.
Is this really new?
No!
The Templars marked the characters in their code with a dot.
I will not bring this to the conference either, it is more of a theory and has nothing solid yet.

Translated with http://www.DeepL.com/Translator (free version)
nickpelling on April 4, 2022 at 5:56 pm said:

Julian J Bunn: there are many possible explanations, and capitalisation is indeed one of them. But… I suspect it needs a bit more added to explain what we see.
Agasul on April 4, 2022 at 6:08 pm said:

@Nick
I think the only thing that could help us is a real Continuum Transfunctioner. 🙂
Ruby on April 4, 2022 at 7:42 pm said:

Nick, thank you for your summary.
I would like to add a small contribution, although it is possible that this point was already discussed.
If the glyph s is overrepresented at the beginning of the lines, it should be checked whether it serves as a hyphenation sign.
If there is no longer word in the text, related to the last word of the previous line, then it is not.
nickpelling on April 4, 2022 at 10:41 pm said:

Ruby: this is definitely a good suggestion, with the only issue being that d- s- and y- are all over-represented in LIBNPIG position. However, it might be that one of these often lines up immediately after a line ending -m or -am (i.e. there may be both an ending hyphen and a starting hyphen). That’s kind of what you would hope would emerge from a good statistical survey. 🙂
Guest on April 4, 2022 at 11:25 pm said:

This sounds exciting, I found some definitions is that it was written in the dates of the 3rd crusade, and even more interesting, is that there is supposed to be clues of the Holy Grail in it. May I participate? I do have a bible from 1100 AD, it is bound in pigskin and printed on a Guttenberg press. Martin Luther wrote it. Thank you Sharon
D.N.O'Donovan on April 5, 2022 at 3:37 am said:

Nick,
re your comment of April 4, 2022 at 4:55 pm

Absolutely! 🙂
Christopher on April 6, 2022 at 10:23 am said:

Surely Martin Luther must have had a time machine to write a bible in 1100 AD 😉
Anon on May 14, 2022 at 12:32 pm said:

Hi Nick: Did you happen to see this in today’s news? Some of the pics look like the Voynich manuscript. Thanks. https://www.atlasobscura.com/articles/medieval-alchemist-secret-code