26Sep 2010

New Voynich A/B hypothesis…

More than 30 years ago, ex-US military codebreaker Prescott Currier was looking at the Voynich Manuscript, when he noticed not only that the handwriting changed (though he was uncertain how many different scribes were involved), but also that the language itself (or, more precisely, the rules governing how Voynichese letters meshed with each other) changed. He called the two major Voynichese ‘dialects’ thus identified “A” and “B” (though it turns out that quite a few pages are subtly intermediate between A and B).

Hence one large shadow hanging over any discussion of Voynichese is the issue of why such a clearly constructed language / system as Currier A (which was almost certainly written before Currier B) needed to be modified to make Currier B. After all, as Jerry Pournelle used to say every couple of months in Byte magazine, “if it ain’t broke, don’t fix it“, surely?

And yet it seems that the Voynich’s author did fix it: so, might the presence of statistical differences be a clue that Currier A was in some way broken? To me, this implies that we should try to quantify and model the differences between A and B pages, so that we can see what aspects of A were modified to make B pages, just in case this exposes some subtle weakness of the A language. Basically, what flaws in the A language were the A→B hacks trying to cover up?

As part of this whole process, I’ve recently been looking closely at the ‘l’ character in EVA transcriptions of the Voynich Manuscript, and what the different treatment of ‘l’ characters on A and B pages might be able to tell us. It’s well-known that ‘l’ is very commonly preceded both by ‘o’ and by ‘a’ – but does this behaviour change much between A pages and B pages?

According to my online Javascript analysis tool:-

In A pages, ‘l’ is preceded by ‘o’ 72.7% of the time, and is preceded by ‘a’ 22.9% of the time.
In B pages, ‘l’ is preceded by ‘o’ 43.7% of the time, and is preceded by ‘a’ 29.0% of the time.
Freestanding ‘l’ (i.e. ‘l’s not preceded by ‘a’ or ‘o’) occur 118 times in A pages, but 1706 times in B pages.
‘ol’ usually appears preceded by a space (97% of the time in A pages, 96% of the time in B pages)
Freestanding ‘l’ usually appears preceded by a space (90% in A, 95% in B).
The summed counts for ‘ol’ and freestanding ‘l’ remains roughly the same (5.1% in A, 4.7% in B)

What is most interesting about this to me is that it seems to be saying that ‘ol’ and freestanding ‘l’ function in very similar ways, but in the transition from A to B, freestanding ‘l’ seems to have replaced ‘ol’ in about 37.5% of cases. That is, it seems to me that ‘ol’ and ‘l’ (when not preceded by ‘a’) might well represent exactly the same token: which is to say that, al’s aside, ol = l.

So, according to my current forensic reconstruction, ol and al were verbose tokens in the A pages, but because ol appeared so often (4.57%) in A pages (thus bloating the size of the ciphertext), the author finessed this in B pages. By replacing many ol’s with l, ol’s percentage went down to 2.67% while freestanding l went up to 1.66% in B (relative to 0.27% in A).

I’m pretty sure that Glen Claston’s concern about the bloating effect of verbose cipher was shared by the VMs’ author, and that at least some of the changes between A and B were done in order to tighten up the output. Why else fix it if it wasn’t broken?

Posted in: Voynich Manuscript ⋅ Tagged: Glen Claston

31 thoughts on “New Voynich A/B hypothesis…”

Rich SantaColoma on September 28, 2010 at 3:35 am said:

There are of course systems in which the exact same plain text can be enciphered/encoded in many different ways, at the whim of the encipherer. That is, while not endangering the plain text, or giving a decipherer any latitude in their work. If different people were using such a system, their individuality might show in the results. In such a case, if Mr. A and Mr. B were both working on the same project (even with similar plaintext content and so, counts), the results may be identifiable as each of theirs, as the Voynich is. I never thought that the A/B effect was necessarily evidence that the content of the A and B parts were (significantly) different, or that different systems were necessarily used on each part.
nickpelling on September 28, 2010 at 7:46 am said:

Rich: in my opinion, the old suggestion that a putative Mr A and Mr B might have worked on the VMs in parallel simply doesn’t match the codicological, palaeographic and cryptologic evidence. But it would take a fairly substantial blog post to outline the evidence and reasoning involved (Q9 springs to mind), so I’ll have to come back to that another time. All the same, the quick version is: the existence of pages that are mid-way between A and B (and their position in the sequence of pages) imply that the cipher system evolved from A into B via some intermediate stages.
Elmar on September 28, 2010 at 8:01 am said:

Hi guys,

I have a slightly different take on it. While I agree with Rich that there seem to be some “soft” aspects to the encipherment (ie leeway in the modus operandi/key being used), the fact that there is a transitional phase between Currier A and B seems to me to speak against a team of writers of the VM as well as against a sudden “revision” of the enciphering algorithm. (Unless the algorithm was constantly refined in small steps.)

My view is rather that the encipherment from the start allowed variations in the algorithm (much like grammar allows different sentence compositions, all of which are “correct”.) The author then chose the options he “liked” best, got used to, or which simply turned out to be the most “practicable.”

But, of course, this is just gut feeling.
nickpelling on September 28, 2010 at 9:28 am said:

Elmar: all fair enough points, thanks – but given that A words and B words differ in such a large number of ways, I think there is more evidence of an evolution in the system from A to B rather than simply a matter of general leeway in enciphering choice (even though, like you, I suspect this was probably a feature of the system anyway).
rene zandbergen on October 4, 2010 at 12:54 pm said:

What Currier did not point out (unless he did and I forgot about it), but is now well known and could be very important, is that the B-language pages are much more verbose than the A-language pages.

The only real comparison area is the main herbal section. We don’t know what the Bio and stars area would have looked like in A language.

Anyway, if there is a transition (which I also tend to believe) from A to B, it is a transition that makes the result more verbose.

Two people working at the same time is not excluded by the intermediate form, in my opinion. They could have started together, one on pharma and one on astro (which are similar), and one evolved in one direction (pharma -> Herbal A) and the other in the other direction (zodiac -> bio -> stars). This guy was ready first, and then had to help A doing the last herbal pages…. Perhaps A did all the drawings….

Cheers, Rene
Goose on June 16, 2013 at 5:21 pm said:

Hi, i see that this is an old discussion, but I’ll try my luck anyway: could the difference between A and B reflect a change in the plaintext language? Say, for example, the author starts with a French plaintext, but moves to the south of France mid-manuscript. Picks up occitan slowly, hence the transition parts where he’s mixing the two, and writes the remaining pages in occitan? Two languages with similar roots, but clear differences?
SirHubert on October 9, 2015 at 1:59 pm said:

Goose:

I have wondered something similar: whether the manuscript might be some kind of compilation with sections written in two (or more) languages using an alphabet specially designed to represent two (or more) scripts.

It’s not difficult to come up with a hypothetical explanation of how a single enciphering system applied to two languages would produce more verbose results in one than the other. A simple example: imagine a simple Caesar-shift cipher, where you are required to add a single null after each consonant and three nulls after each vowel. Now imagine how that would look applied to a vowel-rich language like Italian on the one hand and to Arabic on the other, where only some vowels are written:

“Office” is “ufficio” in Italian and “maktab”, written m-k-t-b, in Arabic.

“Ufficio” is seven letters long, but enciphered according to the scheme above will look like this (ignoring the Caesar shift for the moment):

UxxxFxFxIxxxCxIxxxOxxx (22 characters)

“M-k-t-b” is four letters long in written Arabic, with two unwritten vowels, and will look like this:

MxKxTxBx (8 characters).

So in this case enciphered Italian is more than three times as long as the Italian plaintext, while the enciphered Arabic is only twice as long.

Other languages would, no doubt, fall somewhere in between.

Pure speculation, but it does show how a single enciphering system can produce more or less verbose ciphertexts according to the properties of the language and script being enciphered.
don of tallahassee on October 10, 2015 at 2:39 pm said:

Intermediate pages, part way between Currier A and B are referred to in this thread.

Can anyone give a list of which are which, which are A, which are intermediate and which are B?

All I can find is a list of A & B.

Is there any way to further determine which are slightly intermediate, but mostly A – intermediate – intermediate, but mostly B, thus illustrating more definitively what may be a gradual transition between A and B (or B and A), which is also referred to?

Thank you.

Don of Tallahassee
nickpelling on October 10, 2015 at 5:48 pm said:

Don: I don’t have a definitive list of these transitional pages, but I know it’s something Rene Zandbergen did compile – Rene, are you there? 🙂
Rene Zandbergen on October 11, 2015 at 7:20 am said:

🙂
It’s a bit tough to summarise very briefly. Currier saw two hands and two languages, and devised a rule to decide for each page in the MS whether the language was A or B. Applying this rule, each page then became either A or B, even though in some cases the choice was not that clear.
The full paper of Currier is somewhere at my web site. I would recommend this page:
http://www.voynich.nu/extras.html
to find it, and various other ‘special topics’ at my site.

The key figure of my own analysis is this one:
http://www.voynich.nu/extra/img/curvalang.gif
In this graph, which shows three sections through a cloud of points, each point is one page in the MS. The location of the point depends only on the text statistics. The colour depends on the type of illustration. Since the colours are not scattered all over the place, but are clustered, there is some sort of relationship between the text statistics and the drawings. This basically anticipates the Montemurro result.

Currier A is the red cloud. Currier B is magenta plus dark and light blue. The ‘connection’ is from the pharma, astro, cosmo and zodiac pages.
The ‘problem’, if we may call it that, is that both red and light blue (cyan) are herbal pages, so I cheated in the sense that I anticipated the result knowing that Herbal-A and Herbal-B text properties are very different. This is not highlighted in the Montemurro paper, and I think it undermines the conclusion to some extent.

There’s much more to be said, but also much more to be analysed yet. Using Eva, I would tentatively identify four character combinations, whose relative frequency may be used to identify the language or dialect.
First is “hol” which is high-frequency for A language.
Second is “eol” which is more typical for pharma.
Third is “eod” which is more typical for astro/cosmo.
Fourth is “ed’ which is typical for all B langauges.

Not sure what it means yet.
There’s an interesting page: f57r (the recto of the famous four times repeating cycle diagram).
It has two paragraphs.
The first paragraph is full of “type-3” words and has no other. The second paragraph is full of “type-4” words and has no other.
Coul this be the Rosetta stone of the Voynich MS? (The same text in two ‘languages’).

May the real Champollion stand up…..
D.N.O'Donovan on October 11, 2015 at 2:08 pm said:

folio 57r is extremely problematic. I don’t doubt that the membrane is 15thC but have substantial doubts as to whether the diagram is.

I am inclined to agree with the conclusions of Tiltman and Friedman that the text is not in cipher, and to agree in general with the sort of model offered by Don.
In fact, most of the best original insights appear to have been ignored, and their authors driven away by personal insults, and ‘gang bashing’. I do feel sorry for you Rene, that so many of the worst offenders are those most desperate to have your approval – a most invidious position.
nickpelling on October 11, 2015 at 2:56 pm said:

Diane: sorry to hear that the diplomacy classes aren’t working, your comments are still just as full of ad hominems against both the living and the dead as they ever were.
TigerOfDarkness on October 11, 2015 at 4:35 pm said:

Diane – do you think the diagram is older or younger than 15thC?
boyfriend , Champollion,,. :-) on October 11, 2015 at 7:21 pm said:

Friend + Rene.

Not sure what it means yet.? folio 57r.
May the real Champollion stud up…..? René and friend. I look at folio 57r. And I will wrote to you what is written there.

Otherwise, as I wrote you. 🙂 With the cinematograph you were right. On the page, he writes instructions. To produce a simple apparatus which impetus to the images of women. Detailed instructions.
D.N.O'Donovan on October 12, 2015 at 2:32 am said:

TigerOfDarkness,
I suspect it might be much later – there is an uncanny similarity between the way the figures are drawn – especially their limbs – and drawings in one of Kircher’s works which are just as (very, very) bad, and in precisely the same way. Thing is, that drawing of Kircher’s is obviously gained from some Hindu source, or at least some source with direct access to Hindu imagery, so what this might mean is that both come from the same source, ultimately.
There’s also the fact that the ratios of the central, asymmetrical sort of ‘island’ thing are so precise. (Rich Santacoloma noticed, too, that the diagram’s structure was very calculated, and relied on use of compass etc., and that there are actually two centres for that diagram, and perhaps three) The mindset and approach to drawing is utterly different from that in other sections. Some argument might be made for similar skills in other astronomical diagrams, but then you still have the problem of a very different drawing style.
I’m not in a position to test the possibility, but if I were, I should see if the inks on that folio aren’t of a later composition. Maybe even Kircher’s era. It isn’t one of the folios which McCrone was asked to test, sadly.
D.N.O'Donovan on October 12, 2015 at 2:35 am said:

Nick,
You’re right! I have been taking your blog as my model of diplomatic interactions, and your way of addressing your more loyal readers as my model for the English way of scholarly address.

I do wish SirHubert, or better still “Sam G” had a blog to which I might subscribe. Ah well.
D.N.O'Donovan on October 12, 2015 at 2:44 am said:

Should I take it that among the educated of England, to speak of a professor of Linguistics, and a fellow researcher into this manuscript as “an idiot” is not considered offensive, or an ad hominem?

Is it an ad hominem to feel that the person so described deserves an apology, if not from the attacker, then from anyone decent enough to object?

The ad hominems in Voynich studies may be expressed as mildly as saying that someone “has an agenda” – a remark once addressed to me about you, and to which I objected immediately.
It may be as vaguely nasty a slur as suggesting that one’s research should be ignored because one is “out to make a name” for oneself.

But when the upshot of it is to try and prevent other people studying another person’s research without the cloud of pre-emptive bias, it is still the worst sort of ad-hominem.

At least you don’t go for the under-handed, action-at-second-hand approach, which I find just sleazy.

Neither do I.
D.N.O'Donovan on October 12, 2015 at 2:49 am said:

So long, Dr. Pelling.
I hope your “Curse” has a long and successful print-run.
nickpelling on October 12, 2015 at 5:46 am said:

Diane: if a linguistics professor repeatedly does idiotic things, I prefer to keep my words simple and call him an idiot. As is the case here. And when those idiotic things become so highly rated by Google that they become a global portal (if not a vestibule) to an entire study area, then that is nothing short of a farce. I only wish more people had the courage and intellectual bravery to speak out against such arrant, glib, pathetic nonsense.
D.N.O'Donovan on October 12, 2015 at 6:11 am said:

I suppose I may still buy a copy of your book?
nickpelling on October 12, 2015 at 6:21 am said:

Diane: I’m going to have to buy yours, so why not?
boyfriend , Champollion,,. :-) on October 12, 2015 at 11:29 am said:

Dear colleague Diana.

Mr. Nick is right, of course. In it he writes about S.B. Colleague Bax writes a manuscript crap. And that´s great. And of course, is not alone. Your dear colleague writing about the Voynich manuscript, also stupidity. Colleague read the instructions for translation. It is written on page 116 of the manuscript. And then certainly you know that I am writing the truth.
The only one who has a chance to break the cipher. Is a champion John Pelling. You do not have a chance.
Mr. Pelling knows what it means homophonic substitution. You can just write your text and space technical terms. That is all. And this is very little.
Otherwise, the friends, the actual cipher. In the Czech Republic there are a great many manuscript. Which encrypted in a similar way.
Everyone should be aware of. And take into acount.

For medieval thought and literary creation, is very characteristic finding hidden meaning behind literature and outer form.
James R. Pannozzi on October 12, 2015 at 2:42 pm said:

This idea of pursuing the differential between Voynich dialects, or languages is intriguing.

Side Note: There was a long since forgotten tool I used to use to compare different versions of source code, it was released to the public domain in 2001 by its creators. It was the best text comparison tool I ever used, but I dropped it since I use Macs now and it was a DOS app, of all things. The last thing the creators did was expend its file name capability past the DOS 7.3 limitation. Ran fine in a dos box on XP or in Wine under various Linux implementations. It was called “DELTA” but I don’t know if you could even find the thing out there now. It would be ideal for comparing EVA strings that were close but had some differences, it’s utility was in its colorization capabilities to make the differentiated characters stand out – an eye saving as well as thought saving advantage.
SirHubert on October 12, 2015 at 5:23 pm said:

Nick: is there any way that you can restore the comments Stephen Bax posted on your site back in 2013 or whenever it was?

I remember some of them really quite clearly, and they were not very edifying. I don’t think I’ve seen anyone else post anything quite as academically objectionable on this blog. At least, that is how I remember it.

I don’t know the man (at least I don’t think I do), and I don’t know you either beyond your blog and book. But I think if people were able to read that exchange and see what would said, it would explain much.

Personally, I think it’s a great shame that an otherwise interesting series of exchanges about Currier languages and dialects has ended up in a series of posts about people being abusive to one another. I do wish things didn’t have to be like that.

Diane: thank you, but I have absolutely no worthwhile or original or interesting observations about the manuscript which I’d presume to inflict on humanity in blog form. Should this change, I’ll let you know 🙂
nickpelling on October 12, 2015 at 6:25 pm said:

SirHubert: deleted they are, and stay that way they will – I’ve had to read some disgusting trollbait over the years, but the oily smirk that was plastered over his capped the lot of them. Sorry to any passing sociologists, but they’re not coming back.

But are you comfortable that Bax has somehow contrived – thanks to endless reposting on trashy believe-it-or-don’t websites – to become the academic face of the Voynich? How low we have descended, crashing through the basement floor of Farce into somewhere far worse. 🙁
boyfriend , Champollion,,. :-) on October 12, 2015 at 7:46 pm said:

For Zandbergen .
So I looked ad folio 57r.
( First , I have to write the entire manuscript is written a a rebus ( riddle).
I´ll show you the beginning of the text. The upper section.

Write handwriting :

Pocco oHccq odan.

Substitution Czech language ( Old language). :

Poslo oN ččí ztín.
———————————————————–

Importance in today´s language.

Posle on čí stín .
———————————————————–

Meaning : Posle ty, čí je stín. ( Czech language)

Meaning english language :
O messenger of those whose shadow is.
————————————————————-

The author writes and ilustrates what the code is ( shadow).
On the site also says what the letters of the text removed.
Also, it says that the text is read from right to left as well.

In the text are given two names. ( Eliška + Hus )
The third name you can not write. It is a new ( secret) the name of Jan Hus. Under that name lived master Jan. For many years, when he returned from exile. The same is written in Constance chronicle.

Substitution of gematria I have already written. So you can be very easy to check.
D.N.O'Donovan on October 13, 2015 at 1:20 pm said:

SirHubert,
Those exchanges seem to have happened during my year away. Perhaps you are right in saying that one would better understand the intensity of Nick’s feelings if those exchanges could still be read.

I appreciate Bax’ site, and especially as it was in earlier months.

For a while it provided a very open environment for discussion of the manuscript, where at last one could present some of the evidence against the “central European” story without being immediately hounded and flamed off wherever-it-was.

I’m very glad we now have a second person as the “face” of Voynich studies. Now perhaps if I send a research query to a library, I won’t have to wait six weeks, and then find in response – after sending the query again – that Ii was referred to Rene Zandbergen, “who should have received the email”.

I do not believe that even that email could have reached its destination, because I never did receive any response.

So I can see a number of advantages to having two people receive equal publicity, as “Mr Voynich” even if neither is able to read a word of the thing and neither has any real idea who made it or where, or why.

🙂
nickpelling on October 13, 2015 at 2:42 pm said:

Diane: good luck with that second face.
Rene Zandbergen on October 13, 2015 at 6:55 pm said:

Diane, how about stopping the bickering about people and concentrating on the MS for a change.
D.N.O'Donovan on October 14, 2015 at 4:05 pm said:

Rene,
That is precisely what I have been asking you to do for the past seven years.

Instead, I was subjected from the beginning to slurs such as that I was “out to make a name” or that iconographic analysis shouldn’t be taken seriously. I will not list here all the similarly derogatory remarks offered when evidence is offered by others against your theory: I cannot think of any other than ad hominem responses, vague and unproven, but the aim of which is to ensure that the information itself is paid no attention. This one’s codicological analysis is “unnecessarily complicated”; that one has “an agenda” – someone else “doesn’t have a balanced view” or hasn’t seen the manuscript in the flesh… and so on infinitum… none of it having anything to do with the evidence, or the manuscript.

While I’ve never seen you try to prevent your adherents flaming off any and all opposition to your ideas, you have invariably relied on one or six gallant souls to shred anyone who dares suggest you are mistaken on any point, while you tactfully remove yourself to the parlour.

For six years, while I had responsibilities to various people, including students (two of whom received the same treatment on Rich’s mailing list), I refrained from starting strife. I’ve seen everyone, including your current side-kick, subjected to denigration and non-specific dismissive noises and now I feel free to state my objections plainly.

It is typical that not once in the past seven years have you paid any serious attention to the solid work done by anyone who does not conform to your fantasy of a all-Latin, central European provenance.

Perhaps you might yourself begin considering the fair amount of solid work – all of it about the manuscript – which has been stolidly ignored by your group, but which is generally considered to contribute substantially to our understanding of this manuscript. I only wish that Dana Scott’s work, and Stolfi’s, and the opinions of Tiltman, Friedman and Tony, as well as Julian Bunn’s statistical work, and Don of Talahassee’s brilliantly original insight into the possible reason for the text’s structure were granted the fair hearing that, in the normal way, they would be given.

Perhaps we just have to wait for the old guard to retire, and finally lay to rest the lingering belief in an all-Latin-European- culture cipher text?
Rick A. Roberts on October 15, 2015 at 6:00 am said:

Nick,
I am trying to contact you about KRYPTOS Part 4. My work has 97 characters and I believe that the character immediately before KRYPTOS Part 4 is a “?” for the end of KRYPTOS Part 3. My decipering reads, ” A KEY YOU KNOW WE ALL FEEL FALL WAS A TIME YOU WOULD NEVER BOOK OFF FOR BERLIN TIMER CLOCK IS NOT A CLOCK YOU DO NOT USE A KEY “. My solution has exactly 97 characters and contains BERLIN and CLOCK in it. What do you think of my work on KRYPTOS Part 4? Thank you in advance for your thoughts on my solution.