I recently mentioned in a comment that my working hypothesis was word-initial EVA l- was a different token to EVA l elsewhere: and Emma May Smith asked me what evidence I had for that statement. So I thought I’d post a few stats to throw onto the fire.
The Evidence
Just to be clear, though: because I’d rather not mess up my stats with line-initial EVA l- stats, all the following figures relate to word-initial (but not line-initial) stats. And to keep everything as clear as practical, the comparisons are solely between words beginning l-, ol-, and al-.
So, here are the raw instance counts according to the Takahashi transcription for word-initial (but not line-initial) l-, ol-, and al-. For example, there are 1267 word-initial (but not line-initial) l- words, of which 58 are just EVA l (on its own), along with 433 word-initial (but not line-initial) words beginning with lk-. (Note that the “(-)” line is an estimate, my app unfortunately couldn’t calculate it.)
.l | .ol | .al | |
1267 | 1416 | 477 | |
(-) | 58 | 538 | 256 |
k | 433 | 326 | 42 |
t | 34 | 35 | 1 |
f | 10 | 12 | 3 |
P | 17 | 13 | 2 |
ch | 293 | 138 | 20 |
sh | 105 | 53 | 8 |
o | 171 | 85 | 55 |
a | 41 | 97 | 32 |
d | 48 | 52 | 26 |
y | 13 | 58 | 32 |
To compare these three columns, we now need to turn their values into percentages. What this following table is saying, then, is that word-initial (but not line-initial) l- is followed by k 34.18% of the time, t 2.68% of the time, etc. (Note that I didn’t try to capture all of the values.)
.l | .ol | .al | |
100% | 100% | 100% | |
(-) | 4.58% | 37.99% | 53.67% |
k | 34.18% | 23.02% | 8.81% |
t | 2.68% | 2.47% | 0.21% |
f | 0.79% | 0.85% | 0.63% |
p | 1.34% | 0.92% | 0.42% |
ch | 23.13% | 9.75% | 4.19% |
sh | 8.29% | 3.74% | 1.68% |
d | 13.50% | 6.00% | 11.53% |
a | 3.24% | 6.85% | 6.71% |
o | 3.79% | 3.67% | 5.45% |
y | 1.03% | 4.10% | 6.71% |
In short, this table is trying to compare the contact tables for three word-initial (but not line-initial) contexts: l-, ol-, and al-. So… what does it say?
Though the +f and +p rows are broadly the same for all three contexts, I think just about every row presents significant differences. For example:
- Only one word in the VMs begins with EVA alt (on f72v2, Virgo)
- Comparisons between the ch and sh lines seem to imply that tehre is vastly more similarity between ch and sh (ch seems to occur 3x more often than sh) than between l-, ol-, and al-.
- l- is typically followed by k (34.18%) and ch (23.13%), but this is quite unlike ol- and al-.
However, the biggest difference in all these counts is where l, ol, and al form the whole word (the “(-)” row). So here’s the last table of the day, which is where the whole word counts are removed from the totals, i.e. word-initial but not line-initial and also not word-complete:
.l | .ol | .al | |
k | 35.81% | 37.13% | 19.00% |
t | 2.81% | 3.99% | 0.45% |
f | 0.83% | 1.37% | 1.36% |
p | 1.41% | 1.48% | 0.90% |
ch | 24.23% | 15.72% | 9.05% |
sh | 8.68% | 6.04% | 3.62% |
d | 14.14% | 9.68% | 24.89% |
a | 3.39% | 11.05% | 14.48% |
o | 3.97% | 5.92% | 11.76% |
y | 1.08% | 6.61% | 14.48% |
Even though taking out all the word-total instances has damped down some of the larger ratios, there are still plenty of big ratios to be seen.
Perhaps the most surprising is the comparison between ly- (1.08%) and aly- (14.48%). (Interestingly, all but one of all the places where the ly and aly instances occur in the text are at the end of a line or butted up against a mid-line illustration. Which I think points strongly to ly and aly being abbreviated in some way, but that’s an argument for another day.)
The Conclusion
For me, I simply can’t see anything systematic or language-like about the comparisons between any of the three columns. When their contact tables are so different, what actual evidence is there that l-, ol-, and al- are all presenting the same (right-facing) linguistic context? Personally, I simply can’t see any.
My conclusion from the above is therefore that l-, ol- and al- are (without any real doubt at all) three different tokens, i.e. they are standing in for three different underlying entities.
Thanks Nick, your thinking is pretty clear: [l, ol, al] have different rightward distribution, so they must represent different ‘units’. I do disagree, as I’m sure you expected, but I appreciate the hypothesis you’re stating.
I can only point out that for words starting [lk] and [olk], and for words starting [lch] and [olch], their ten most common types have 80% overlap. That is, the paired versions of words (with or without [o]) are have similar relative frequencies. In both the pairs of the same four words are at the top of the list:
[lk]: lkaiin, lkeey, lkeedy, lkain
[olk]: olkeedy, olkeey, olkain, olkaiin
[lch]: lchedy, lchey, lchdy, lcheey
[olch]: olchedy, olchey, olcheey, olchdy
I would propose that the ‘root’ word starting with [l] has an [o] added to the start to create words beginning [ol] (or [a] for [al], but I’m less certain about that). Is there any test we can use to prove or disprove either hypothesis?
Emma May Smith: I’m not asking you to agree here, I’m just trying to communicate my evidence, reasoning and argument as completely and as clearly as I can. My point wasn’t that it disproves all linguistic takes on l- / ol- / al- words, but rather that the evidence I discussed didn’t seem to match with those same kinds of linguistic takes on l- / ol- / al- words. Not a proof of ‘cipherness’, then, but a lack of proof of ‘linguisticness’.
I now need to consider the structure of the argument you’re (kindly) presenting in return in much more depth, to see whether it genuinely has the force you think it does. Statistics-based arguments are notoriously easy to get subtly wrong (and on so many different levels) that it very often pays to take a closer look to see if all the pieces join together in the way the proposer believes they do.
Here’s an l/ol related page from 2010: http://ciphermysteries.com/2010/09/26/new-voynich-ab-hypothesis
Nick: l greatly appreciate you publishing work like this because it helps me to understand what the usefulness of the contact tables that you have discussed before might be in untangling the VM text. Although it is definitely more work, at least for me the point of this would be more clear with some control data. You state you’re trying to see patterns of “languageness” or “cipherness” and l understand how “cipherness” might be tough to come by 🙂 but certainly “languageness” could be done? For example, although l definitely could be misunderstanding this exercise, would a contact table for .h, .th, & .sh vs some group of letters (vowels?) in a similar size of English text be a possible control? Obviously the line start issues, etc. of the VM are confounding – But it would be helpful to me to see what pattern would be expected to be seen that you are not seeing. Or maybe such a thing is just not possible . . .
Those silly manuscripts to foil the Jesuits. I cracked the Beale papers way back in 1982. Come on…some kinda reward here?!
I would like to direct you and other VM researchers to my blog where I explain the meaning and the difference of the words ‘l’, ‘ol’ and ‘al’. In Slovenian dialectical language of the 15th century (and still to this day) , ‘l’ stands for ‘le’, which means ‘only’. It was often written together with the word that followed. The words OL and AL were a phonetic dialectical words for ‘if, but’. The word OL also meant OIL.
I would also like to point out that O was also a prefix (this was suggested by many VM researchers) – in Slovenian, the prefix O- means finished action. For example: LEK in Slovenian means ‘healing remedy’ and the root word for the verb ‘to heal’. LEKAM was the old Slovenian word for ‘i heal’. OLEKAM means positive outcome, not just trying to heal, but remove the illness. Because the E in the word LEK was a half-sound, it was not written, however in the VM, the missing vowel is often indicated by the space after OL, so that the exact meaning has to be determined by reading the words in the context, since OLKAM and OL KAM could be two different words (‘I heal’ or ‘but where’).
To make matters more complicated, there is also another grammatical form for the word ‘healing remedy’ which in Italian writing convention would be spelled as LEC. Adding the grammatical ending for the 1. person singular, present tense would be LEC-IM. C in this case is pronounced as soft Č (ch), so that the word LCH-IM is in fact the same as LEKAM. There are several other Slovenian words that fall under this grammatical pattern, such as the word TEK – (TEKAM or TEČEM, REK (REKAM or REČEM).
In Slovenian language, PO (Eva QO) is also a prefix that has similar indication of a finished action.
Also, because the Slovenian language is highly inflective, different grammatical endings could be applied to the root word.
For more detailed information, please go to my blog
http://voynichslovenianmystery.com/
Not sure where to put this.
Nick and all,
Did you know that ‘Voynichese’ has made it into the Omniglot site?
The script used is one devised by Erik Olsen. It doesn’t have the ‘feel’ of Voynich script. Very heavy on the verticals etc., but what do people think of his system?
Omniglot com and use the A-Z index.