No, I’m not blogging about a Joe Cornish / John Boyega medieval mash-up, but about applying my block paradigm attack approach to the Voynich Manuscript’s Quire 20.

This is a “known plaintext attack“: in non-crypto-bro English, it means ‘working out whether a given text is (somehow) the plaintext of the last section of the Voynich Manuscript’, even though we can’t read a word of the latter.

To do this, we’d need a pretty strong candidate text: in fact, we’d want one with broadly the right kind of structure, broadly the right kind of length, and which (preferably) might well have been considered a “trade secret” circa 1400-1450. But which we now have a copy of.

Since 2014 or so, my #1 candidate has been Jehan le Begue’s 1431 collection of colour-related recipes, which famously appeared in Mary Philadelphia Merrifield (1849). These recipes are in Latin and in French (plus Merrifield’s sons’ English translation), though Jehan le Begue helpfully notes that most were copied from Giovanni Alcherio’s collection of colour-related recipes (though le Begue added more of his own in French), even listing the sources that Alcherio listed. And so I’m arguably slightly more interested in Alcherio’s collection than le Begue’s.

However, this kind of begs the question: how can you be sure of the structure of an enciphered text? Making this even harder is the fact that there is convincing evidence that many of the Voynich Manuscript’s bifolios have ended up shuffled around (and according to no obvious pattern).

Hence this post starts with what we know about Quire 20, before moving on to Giovanni Alcherio’s collection of recipes. As always, wrangling all the individual pieces into one place is extraordinarily time-consuming, so that’s what this first post concentrates on doing – Part 2 will try to bring all these together to do the actual attack. (Which is arguably 10x harder. But you have to start somewhere.)

Notes on Quire 20

From my numerous blog posts on Q20 (starting with 2010), I have floated numerous tentative conclusions about how Q20 ‘works’, none or all of which may be useful in this context:

  • Tailed paragraph stars may well be a steganographic ‘y’, short for ‘ytem’ (i.e. a bullet point)
  • Tail-less paragraph stars may have been added to make the tailed paragraph stars less obvious
  • Q20 was probably originally two separate quires/gatherings, that were later shuffled together
  • f105r (with the ornate gallows at top left) was probably the start of Q20A
    • I would be unsurprised if the Voynich ‘title’ at the bottom of f105r was Q20A’s book title
  • f103r was probably the start of Q20B (i.e. with f116v as its last page)
    • The stars on f103 and f116 are almost all tail-less
  • The right margin gap on f112 is probably a copy of a vellum tear in the document it was copied from
    • (I covered this in Curse 2006)
  • Elmar Vogt flagged an empty-full star-colour pattern, which f103, f104, and f108 didn’t conform to
    • I wondered whether that implied f103, f104, and f108 were originally bound together
  • Tim Tattrie pointed out (among other things) links between words on f104r and f108v
  • Rene Zandbergen pointed out in 2016 that many of the stars on f111r look to be fake

TL;DR – even though earlier Voynich researchers usually thought of Q20 as a single ‘thing’, it instead seems to have started life as two or more separate ‘books’. There also seems to be a category difference between tailed star paragraphs and non-tailed star paragraphs, with the former possibly denoting the start of an item.

Quire 20 Item / Paragraph Structure

Note that f109 and f110 appear to have been the two halves of a central bifolio that got removed when the manuscript was a couple of centuries old or so (say, after 1600 but probably before 1700). (However, there’s currently no obvious reason to presume that it was a central bifolio in the original (‘alpha’) bifolio nesting order.)

  • f103r: 18 x no-tail, then 1 x odd “top tail”
  • f103v: 14 x no-tail
  • f104r: 13 x tail
  • f104v: 13 x tail
  • f105r: fancy gallows, then 9 large tails
  • f105v: 10 x tail
  • f106r: 15 x tail (#3 has a tiny ‘child’ star’)
  • f106v: 14 x tail
  • f107r: 15 x tail (#11 has “…” next to it)
  • f107v: 15 x tail
  • f108r: 16 x tail
  • f108v: 16 x tail (note that 7-8 & 11-16 seem fake, #10-#16 seem to be a single paragraph)
  • f111r: 17 x tail (note that 2-12 & 14 seem fake, #1-#12 seem to be a single paragraph)
  • f111v: tail, no-tail, 2 x tail, 4 x no-tail, tail, 5 x no-tail, tail, 4 x no-tail (#2-#8, #10, #17 seem fake)
  • f112r: 7 x tail, no-tail, 4 x tail
  • f112v: 5 x tail, no-tail, tail, no-tail, 4 x tail, no-tail
  • f113r: 16 x tail
  • f113v: 15 x tail
  • f114r: 13 x tail
  • f114v: 8 x tail, no-tail, 3 x tail (#5 seems emphasized)
  • f115r: tail, no-tail, 11 x tail
  • f115v: 13 x tail
  • f116r: 10 x no-tail, followed by two large unstarred paragraphs (like a colophon)
  • f116v: (end-page)

We can also rearrange these same lines by bifolio (rather than by sequentially numbered folio):

  • f103-f116 bifolio
    • f103r: 18 x no-tail, then 1 x odd “top tail”
    • f103v: 14 x no-tail
    • f116r: 10 x no-tail, followed by two large unstarred paragraphs (like a colophon)
    • f116v: (end-page)
  • f104-f115 bifolio
    • f104r: 13 x tail
    • f104v: 13 x tail
    • f115r: tail, no-tail, 11 x tail
    • f115v: 13 x tail
  • f105-f114 bifolio
    • f105r: fancy gallows, then 9 large tails
    • f105v: 10 x tail
    • f114r: 13 x tail
    • f114v: 8 x tail, no-tail, 3 x tail (#5 seems emphasized)
  • f106-f113 bifolio
    • f106r: 15 x tail (3rd star has a tiny ‘child’ star’)
    • f106v: 14 x tail
    • f113r: 16 x tail
    • f113v: 15 x tail
  • f107-f112 bifolio
    • f107r: 15 x tail (#11 has “…” next to it)
    • f107v: 15 x tail
    • f112r: 7 x tail, no-tail, 4 x tail
    • f112v: 5 x tail, no-tail, tail, no-tail, 4 x tail, no-tail
  • f108-f111 bifolio
    • f108r: 16 x tail
    • f108v: 16 x tail (note that #7-#8 & #11-#16 seem fake, #10-#16 seem to be a single paragraph)
    • f111r: 17 x tail (note that #2-#12 & 14 seem fake, #1-#12 seem to be a single paragraph)
    • f111v: tail, no-tail, 2 x tail, 4 x no-tail, tail, 5 x no-tail, tail, 4 x no-tail (#2-#8, #10, #17 seem fake)

Commentary: as with the Herbal bifolios, there are often unexpected consistencies to be found between the contents of two folios where they are part of the same (attached) bifolio. The most obvious example of this is f108v, f111r and f111v, which all have large paragraphs and what appear to be fake stars. Yet f108r and the first five paragraphs of f108v seem to be quite different (they’re more ‘metronomic’, small paras regularly followed by other small paras). I can’t help but wonder whether there is a change in recipe ‘style’ part way down f108v: and also whether the f108-f111 bifolio may have originally been the central bifolio of a guire / gathering.

It also seems that we have three categories of starred paragraph to wrestle with: starred ‘item’ paragraphs, non-starred paragraphs, and starred non-paragraphs (i.e. fake stars, which may or may not have a tail). So I think we have to be very much on our toes when trying to draw inferences about stars.

More generally, there are occasional changes in ‘tempo’, e.g. when dense small-para pages with tiny tight stars (such as f111v) get followed by not-so-dense pages with larger paras and fewer stars (such as f112r). These give me the strong impression that we’re not looking at a single, uniformly-structured list of items, but rather at several different kinds of item (i.e. with different text styles) that have ended up interleaved. Moreover, the presence of fake-looking stars (as flagged by Rene) looks to me as though the author may have been trying to conceal some aspects of the very structure I’m trying to discern. But maybe this is a good sign, and that – as Sherlock Holmes once said – “The game is afoot! Not a word!

Jehan le Begue Bibliography

The manuscript itself (Lat. 6741) is in the BnF, and is accessible online here. It’s marked “Ex Libri Lud[ovico] Martelli Rx 1587). It starts with a long table of synonyms, often with alternating red-blue capitals, e.g. part of fol. 2r looks like this:

This is transcribed on Merrifield p.18, but just so you can get acclimatised to the writing, I reworked it below to include the (entirely typical) early 15th century scribal abbreviations visible above:

  • [A]zurium vel lazuriu[m] est color ; aliter celestis vel celes[-]
    tinus, aliter blauccus, a[li]t[er] pers[us], et a[li]t[er] ethere[us] dic[itur].
  • [A]uru[m] est nobilius metallu[m] croceu[m] colore[m] habens et
    tenuatur in petulis, quo carentes utunt[ur] stanno
    attenuato et colorito colore croceo et in petulis tenuato.
  • [A]rgentu[m] est nob[i]le metallu[m] album colore[m] habens, quo
    qui caret utitur ejus loco de d[i]cto stanno tenuato, non colorito.
  • [A]uripigmentum est color croceus qui al[i]t[er] arsicon dicit[ur]

This is followed by:

  • Experimenta de coloribus
  • Experimenta diversa alia quam de coloribus
  • Liber Theophili admirabilis et doctissimi magistri de omni scientia picturae artis
  • Liber Magistri Petri de Sancto Audemaro de coloribus faciendis
  • Eraclii sapientissimi viri liber primus […]
  • De coloribus ad pingendum capitula scripta et notata a Johanne Archerio seu Alcherio anno Domini 1398 […]
  • Capitula de coloribus ad illuminandum libros ab eodem Archerio sive Alcherio scripta et notata anno 1398 […]
  • Aultres receptes en Latin et en Francois per Magistrum Johannem dir Le Begue […]

Mark Clarke’s (2001) “The Art of All Colours” p.101 includes a half-page description of Lat 6741, and also notes a complete 19th century transcription in BL MS Add. 27,459. The most important reference (crazily omitted from the BnF description) is the major part of Volume 1 of Mary Philadelphia Merrifield (1849), who not only transcribed 6741 (adding a voluminous introduction), but also had her sons translate it into English (“except [for the] Theophilus portion”, Clarke points out). Clarke also notes that a transcription appears in van Acker (1972) “Petri Pictoris Carmina”, pp.143-198 and 242-246. A more recent edition was by Inès Villela-Petit (1995), presumably in her PhD dissertation (which I haven’t seen).

For accessible sources on Alcherio / Alcherius, I’d heartily recommend:

  • “The Recipe Collection of Johannes Alcherius and the Painting Materials Used in Manuscript Illumination in France and Northern Italy, c. 1380-1420”, by Nancy Turner
  • “Copies, Reworkings and Renewals in Late Medieval Recipe Books”, by Inès Villela-Petit, and translated by Jilleen Nadolny. (Available on academia.edu.)

Navigating Lat 6741

The recipes in Lat 6741 have been sequentially numbered (with occasional gaps) in the left margin, which gives anyone discussing them a helpful starting point. For now, I’m going to restrict this discussion to those 118 recipes that Jehan le Begue copied from Alcherio (or else this would end up insanely large):

  • Experimenta de coloribus
    • 1-46: Latin. Copied by Alcherio in 1409 from “an unbound [quire] lent me by Brother Dionysius […] at Milan.” Note that these recipes use an alchemical-sounding code for colours: “Sol” is for gold (i.e. yellow), “Luna” for silver (“the rust of which is azure”), “Mars” for iron (“the rust of which is violet”), “Jupiter” for tin, “Venus” for copper or brass (“the rust of which is green”), and “Saturn” for lead (“the rust of which is a white colour”).
    • 47-88: Latin. Copied by Alcherio from a second unbound [quire] lent by Brother Dionysius. #48 onwards is “Experimenta diversa alia quam de coloribus”.
    • 89-99: French. Copied by Alcherio from recipes lent to him in Bologna by an embroiderer called Theodore of Flanders, who had in turn procured them in London.
    • 100-116: Italian. Copied by Alcherio in Bologna in 1410 from a book of Magister Johannes de Modena. (Jehan le Begue copied these, and then had a friend translate them into Latin.)
    • 117: Latin. Copied by Alcherio in Venice in 1410, from Michelino di Vesuccio, “the most excellent painter among all the painters of the world”.
    • 118: Latin. Copied by Alcherio in Paris in 1410, from Master Johannes de […something…]

This is also because these 118 recipes seem to have the highest “trade secret” rating of all the recipes given by le Begue: most of the rest were either centuries old or French recipes added by le Begue himself.

Recipes #1 to #46 (fol 22r to fol. 27v)

Diving straight in, recipe #1 looks like this:

Note the recipe number in the left margin, and the title of the section / book embedded in the top line in red. Merrifield transcribes this recipe on her p.47, which (reconstructing the abbreviations above) would look like this:

1. [N]ota q[uod] auree Experimenta de coloribus
li[tte]re scribu[n]t[ur] sic, cu[m] ista aqua ; accipe sulphur vivu[m] et
corticem int[er]iorem mali granati, alum[i]nis, saltis, et de plu[-]
via auri, tantu[m] q[uan]tu[m] vis, et aqua[m] g[u]mmi liquide, et modi[-]
cu[m] de croco, et misce et scribe.

Note, the number of lines in Lat 6741 for #1 to #46 (to the nearest half-line) I counted are:
4.5 4.5 3.5 3 5 6.5 11 2 7 17 13 19 11 14 5 3.5 13 4.5 19.5 6.5 5 9.5 3 7.5 5.5 4.5 9 4 6 4.5 11.5 6 10.5 3.5 5.5 6.5 6.5 6 10 8 8.5 6 5.5 11 6.5 6

Recipes #47 to #88

The number of lines for #47 to #88 (note that le Begue has some numbering gaps) I counted are:

20 4.5 5 3.5 9 5.5 8 4 4 15 9 4 12 9.5 13.5 4 8.5 7 16 4.5 4 (#69 missing) 2 (#71 missing) (#72 missing) (#73 missing) (#74 missing) 5.5 3 1.5 (#79 missing) 3.5 5.5 4 8 8 9.5 3 4 7 3 2

Recipes #89 to #99

These are in French, and have a noticeably different format, with a header preceding each recipe:

Merrifield transcribes this (p.85) as follows:

89. Pour faire l’eau noire. – Prenez une pinte de l’yaue de dessoulz la meule sur quoy on meult les courtesaulx, et la mettes sur le feu, et gettez ung voire de vin aigre, et ii onces de galles, et prenez demie onche d’alon, et une onche de coperose, et le faitez tant boulir, qu’il apetice du tiers, et puis le laissier reposer un jour.

Recipes #100 to #116

These were originally written in Italian, but were translated into Latin by a certain friend of Jehan le Begue.

Recipe #117

25 lines long

Recipe #118

122 lines long, which – compared to the rest of the recipes – is a bit of a monster.

I hope everyone who attended the Voynich Conference 2022 hosted online by the University of Malta enjoyed the presentations and the Q&As.

In Lisa Fagin Davis’ final presentation, she mentioned her recent theory that p/f were in fact ke/te: and mentioned that she’d thought this up, but then found it on Cipher Mysteries. If you want to see the original page I put up in September 2020 suggesting this idea (along with her comment near the bottom), it’s right here, along with the August 2020 page where I started exploring the behaviour of single-leg gallows.

There’s an additional aspect to the set of gallows/e/ch groupings I discussed in 2020, which is that you can usefully compare the (parsed) ch:chch ratio in the text as a whole (which is 10616:18 (0.17%)) both to the (parsed) ratios of strikethrough gallows preceded by ch…

  • ckh:chckh = 634:242 = 38.17%
  • cth:chcth = 766:139 = 18.15%
  • cph:chcph = 185:27 = 14.59%
  • cfh:chcfh = 58:15 = 25.86%

…as well as to the (parsed) ratios of strikethrough gallows followed by ch:

  • ckh:ckhch = 871:5 = 0.57%
  • cth:cthch = 902:3 = 0.33%
  • cph:cphch = 211:1 = 0.47%
  • cfh:cfhch = 73:0 = 0%

This, too, is a strikingly asymmetric result; and would seem to suggest that the ch:chch ratio is practically identical to the ch:c<gallows>hch ratios, yet completely unlike the ch:chc<gallows>h ratio.

I would take this as reasonably good support for the idea that c<gallows>h is actually a visual proxy (and it doesn’t really matter whether this is for scribal, cryptographic, or steganographic reasons) for <gallows>ch, because Voynichese seems to want to avoid c<gallows>hch almost exactly as much as it wants to avoid chch.

Perhaps combining this result with the pe/fe result (and other “forbidden” Voynichese combinations) might be the start of something really positive…

Like (hopefully a fair few) other Voynicheros, I’ve ponied up my 50 euros for the 2022 online Voynich Conference being hosted by the University of Malta in the next few days.

One of the fields in the application form asked for my university or institution: I put down “Cipher Mysteries”, on the grounds that it has ended up a bit of a cipher institution. 🤔 But mainly to make myself laugh. 😁

Will Malta reveal something incredible, awe-inspiring, unexpected, shocking, or amazing about the Voynich Manuscript, in the way academic conferences in novels and films have primed everyone to believe? Actually… maybe, sort of. But not in the way Dan Brown and his overexcited ilk like to portray.

20+ years ago, I remember trying really hard – with almost zero success, it has to be said – to persuade anyone that the Voynich Manuscript wasn’t some kooky fake cooked up by Dr John Dee (back then the fairly dominant opinion), but a genuine historical artifact worthy of close, careful study.

Well, from the 2022 conference’s participant list and programme of papers, it seems that that aspect of my struggle back then has at least borne fruit. It’s now a serious business.

But will there be The Big Breakthrough? You know, the introvert outsider’s slide that shyly reveals The Secret Cipher Key we’ve all long dreamed of? Cue clunks round the world as Voynichero jaws collectively hit the floor.

(*snort* Not a hope, sorry.) But with so many smart, insightful, observant researchers all trying to move forward in broadly the same way, who’s to say that something won’t emerge from it all?

Perhaps it won’t be something showy (or even immediately obvious), but even a tiny step forward would feel Heaven-sent. So let’s just pray a little, hein?

I’ve mentioned Nicolas Fabri de Peiresc on Cipher Mysteries in the past, though mostly in connection with his extensive “Republic of Letters” correspondence, thought to contain somewhere between 10,000 and 14,000 letters. This was because, around 2008, I spent some time wondering whether there might be (hitherto unnoticed) mentions of the Voynich Manuscript in European scientific correspondence networks. A recent email from Diane O’Donovan brought Peiresc back to the front of my mind.

As far as the timing of the Voynich Manuscript’s possible (but sadly not yet certain) sale to Emperor Rudolf II, I’ve long felt it must have happened after 1600 (because there was no mention in Thaddaeus Hagecius ab Hayek’s letters), before 1612 (when Rudolf died), and probably before 1610 (roughly when Rudolf’s brother Matthias took control). If I had to pick a single year, I’d pick 1609, but that’s ultimately no more than an educated guess (yes, the same one that once hurled me down a Rosicrucian rabbit-hole).

Peiresc was a very early telescope owner (in 1610), and probably the first to observe the Orion Nebula (though he didn’t actually stake a claim to this discovery at the time): so was certainly active at the right sort of time. There’s an accessible description of Peiresc’s astronomical activities in Seymour L. Chapin’s “The Astronomical Activities of Nicolas Claude Fabri de Peiresc”, Isis, Vol. 48, No. 1 (Mar., 1957), pp. 13-29, (on JSTOR), through which we can see his wide scientific-minded range of astronomical interests, such as tracking the Jovian moons, producing a detailed engraving of the Moon’s surface, and in using eclipses to determine longitudinal differences.

Peiresc’s Letters

As Hatch points out, Peiresc’s letters are strongly centred on a small number of key correspondents in Paris and Rome: and so its 10,000+ corpus size is perhaps a little bit flattering as to the broader range of his correspondents. Yet he plainly did correspond with astronomers (later in life, he stood up very strongly for Galileo, for example), and so it is far from impossible that there might well be a passing mention of the Voynich Manuscript there.

Unfortunately, I have yet to find an online list of Peiresc’s correspondents (I did see a somewhat unhelpful map that vaguely implied that some were in Prague, or at least Bohemia), so unfortunately I can’t easily compile a list of Peiresc’s astronomy-related letters, as I had initially hoped to do. (Indeed, the intersection of ‘astronomy’ & ‘Prague’ would probably yield a very short list of letters to examine).

Note that Hatch’s chapter “Peiresc As Correspondent: The Republic of Letters & the “Geography of Ideas“” (in Science Unbound, Chapter 2, ed. B. Dolan, Umeå, 1998) seems like it could be promising in this regard, but I haven’t yet seen it.

Peiresc’s Papers

Yet Peiresc had another legacy: his papers. Though he published almost nothing in his lifetime, he constantly made notes on everything he heard and read: and these papers comprised around 60,000 pages at his death, which Gassendi then assiduously ground his way through for two years (to write Peiresc’s biography).

Yet it seems to me that articles on Peiresc tend to be written by people who have carefully selected an achievable thematic subset (e.g. Rubens, astronomy, etc) of his letters to work with (though I don’t believe that his letters have all been published yet) – almost none seems to be informed by his papers.

Might there be some Voynich Manuscript mention in Peiresc’s papers? I don’t know how well these have been indexed (has there ever been an index?), and this post is merely a brief research note – so please let me know if you have a good (probably French!) source describing the contents of Peiresc’s papers!

Here’s a suggestion for a Voynich Manuscript paper that I think might well be revealing: taking raking IR images of f116v. But why would anyone want to do that?

Multispectral imaging

Since about 2006, I’ve been encouraging people to take multispectral images of the Voynich Manuscript, i.e. to capture images of the manuscript at a wide variety of wavelengths, so not just visible light.

My interest here is seeing if there are technical ways we can separate out the codicological layers that make up f116v. To my eyes, there seem to be two or three different hands at play there, so it would make sense if we could at least partially figure out what the original layer there looked like (before the other layer was placed on top, I guess at least a century later).

And in fact one group did attempt multispectral scanning, though with only a limited set of wavelengths, and without reaching any firm conclusions. (They seem not to have published their results, though I did once stumble across some of their test images lying around on the Beinecke’s webserver.)

The Zen of seeing nothing

Interestingly, one of that group’s images of f116v was taken at 940nm (“MB940IR”), which is an infrared frequency (hence “IR”). This revealed… nothing. But in what I think is potentially an interesting way.

Here’s what it looks like (hopefully you remember the michitonese at the top of f116v):

Main banks Transmissive

That’s right! At 940nm, the text is invisible. Which is, of course, totally useless for normal imaging. For why on earth would you want to image something at a wavelength where you can’t see any detail?

Raking Light

The interesting thing about this is that one kind of imaging where you’d want the text itself to be as invisible as possible is when you’re doing raking illumination, i.e. where you shine an illuminating light parallel to the surface. At the edges of penstrokes (if you’re looking really closely) at high-ish magnification, you should be able to use this to see the shadows of the edge of the indentations left by the original quill pen.

And so I’ve long wondered whether it might be possible to use a 940nm filter (and a non-LED light source) and a microscope / camera on a stand to try to image the depth of the penstrokes in the words on f116v. (You’d also need to use an imaging device with the RG/GB Bayer filter flipped off the top of the image sensor; or a specialist b&w imaging sensor; or an old-fashioned film camera, horror of horrors!)

What this might tell us

Is this possible? I think it is. But might it really be able to help us separate out the two or more hands I believe are layered in f116v? Though I can’t prove it, I strongly suspect it might well be.

Why? Because vellum hardens over time. In the first few years or so after manufacture, I’m sure that vellum offers a lithe and supple writing support, that would actually be quite nice to write on. However, fast forward from then to a century or so later, and that same piece of vellum is going to be harder, drier, more rigid, slippier, scrapier – in short, much less fun to write on.

And as a result, I strongly suspect that if there are two significantly time-separated codicological layers on f116v, then they should show very different writing indentation styles. And so my hope is that taking raking IR images might possibly help us visualise at least some of the layering that’s going on on f116v, because I reckon each of these 2+ hands should have its own indentation style.

Will this actually work? I’m quietly confident it will, but… even so, I’d have to admit that it’s a bit of a lottery. Yet it’s probably something that many should be able to test without a lot of fuss or expense. Does anyone want to give this a go? Sounds to me like there should be a good paper to be had there from learning from the experience, even if nothing solid emerges about the Voynich Manuscript.

Anyone who spends time looking at Voynichese should quickly see that, rare characters aside, its glyphs fall into several different “families” / patterns:

  • q[o]
  • e/ch/sh
  • aiin/aiir
  • ar/or/al/ol/am/om
  • d/y
  • …and the four “gallows characters” k/t/f/p.

The members of these families not only look alike, they often also function alike: it’s very much the case that glyphs within these families either group together (e.g. y/dy) or replace each other (e.g. e/ee/eee/ch/sh).

For me, one of the most enigmatic glyph pairs is the gallows pair EVA k and EVA t. Rather than be seduced by their similarities, my suggestion here is to use statistics to try to tease their two behaviours apart. It may sound trivial, but how do EVA k and EVA t differ; and what do those differences tell us?

The raw numbers

Putting strikethrough gallows (e.g. EVA ckh) to one side for the moment, the raw k/t instance frequencies for my preferred three subcorpora are:

  • Herbal A: (k 3.83%) (t 3.28%)
  • Q13: (k 5.38%) (t 2.27%)
  • Q20: (k 5.19%) (t 2.76%)

Clearly, the ratio of k:t is much higher on Currier B pages than on Currier A pages. Even if we discount the super-common Currier B words qokey, qokeey, qokedy, qokeedy, qokaiin, a large disparity between k and t still remains:

  • Q13: (k 4.3%) (t 2.46%)
  • Q20: (k 4.58%) (t 2.89%)

In fact, this k:t ratio only approaches (rough) parity with the Herbal A k:t ratio if we first discount every single word beginning with qok- in Currier B:

  • Q13: (k 2.71%) (t 2.41%)
  • Q20: (k 3.57%) (t 2.86%)

So there seems to be a hint here that removing all the qok- words may move Currier B’s statistics a lot closer to Currier A’s statistics. Note that the raw qok/qot ratios are quite different in Herbal A and Q13/Q20 (qok is particularly strong in Q13), suggesting that “qok” in Herbal A has a ‘natural’ meaning and “qok” in Q13/Q20 has a different, far more emphasised (and possibly special) meaning, reflecting the high instance counts for qok- words in Currier B pages:

  • Herbal A: (qok 0.79%) (qot 0.68%)
  • Q13: (qok 3.04%) (qot 0.74%)
  • Q20: (qok 1.84%) (qot 0.70%)

Difference between ok/yk and ot/yt

If we put ckh, cth and all qok- words to one side, the numbers for ok/yk and ot/yt are also intriguing:

  • Herbal A: (ok 1.38%) (ot 1.31%) (yk 0.51%) (yt 0.48%)
  • Q13: (ok 1.07%) (ot 0.91%) (yk 0.17%) (yt 0.12%)
  • Q20: (ok 1.53%) (ot 1.47%) (yk 0.19%) (yt 0.14%)

What I find interesting here is that the ok:ot and yk:yt ratios are just about identical with the k:t ratios from Herbal A. Consequently, I suspect that whatever k and t are expressing in Currier A, they are – once you go past the qok-related stuff in Currier B – probably expressing the same thing in Currier B.

As always, there are many possible reasons why the k instance count and the t instance count should follow a single ratio: but I’m consciously trying not to get caught up in those kinds of details here. The fact that k-counts are consistently that little bit higher than t-counts in several different contexts is a good enough result to be starting from here.

Might something have been added here?

From the above, I can’t help but wonder whether EVA qok- words in Currier B pages might be part of a specific mechanism that was added to the basic Currier A system.

Specifically, I’m wondering whether EVA qok- might be the Currier B mechanism for signalling the start of a number or numeral? This isn’t a fully-formed theory yet, but I thought I’d float this idea regardless. Something to think about, certainly.

As a further speculation, might EVA qok- be the B addition for cardinal numbers (1, 2, 3, etc) and EVA qot- be the B addition for ordinal numbers (1st, 2nd, 3rd, etc)? It’s something I don’t remember seeing suggested anywhere. (Please correct me if I’m wrong!)

So: do I think there’s room for an interesting paper on EVA k/t? Yes I do!

It’s well-known that the distribution of Voynichese page-initial (and indeed paragraph-initial) glyphs is, unlike the rest of the text, strongly dominated by gallows characters. But what is less widely known is that something really fishy is going on with the distribution of all the other line-initial glyphs too.

As far as I know, nobody has yet given this behaviour the in-depth attention it properly deserves, which is why I think it would make a good subject for a paper. Though it perhaps needs a catchier name than “Line-Initial But Not Paragraph-Initial Glyph” (LIBNPIG) statistics (so please feel free to come up with a better name or acronym).

Though you might reasonably ask: isn’t this just another side of the whole constellation of LAAFU (“line as a functional unit”) behaviours?

Well, yes and no. “LAAFU” is a shorthand mainly used by some Voynich researchers to signal their despair at the unknowableness of why certain glyphs seem to ‘prefer’ different positions within a line. So yes, LIBNPIG behaviour is a kind of LAAFU behaviour: but no, that doesn’t mean it can’t be understood. (Or at least carefully quantified and tortured on a statistical rack.)

LIBNPIG Observations

How do we know that something funky is going on with LIBNPIGs?

LIBNPIG ‘tells’ are perhaps most visible in Q20 (Quire #20). For example, even though EVA daiin is common in Currier A pages (you may recall that it’s one of the ‘Big Three’ A-words – daiin / chol / chor), it’s far less common in Currier B pages: however, when it does occur in Q20, it is frequently in a LIBNPIG position. In fact, this is true of all word-initial EVA d- words in Q20, which you can see here (scroll to the bottom).

Similarly, if you look at EVA s- words (ignoring sh- words, which is a particularly annoying EVA artifact, *sigh*) in Q20, you should also see that these appear far more often line-initially than they should.

Is that all? No. The same is true of EVA y- words in Q20 too, but this pattern is additionally true in Herbal B pages. Note that this also seems to be true of some Herbal A pages, but EVA y- words in Herbal A appear to work quite differently to my eyes. (Though I’d advise looking for yourself, & form your own opinion.)

Curiously, even though paragraph-initial words so strongly favour gallows characters, LIBNPIG words seem to abhor gallows characters, a behaviour which is in itself quite suggestive and/or mysterious.

Conversely, if you go looking for LIBNPIG EVA ch- and sh- words, I believe you’re far more likely to instead find them clustering at the second word on a line. Note that Emma May Smith (with Marco Ponzi) looked at this back in 2017, though more from a word-based perspective (even though the first two words on a line in Q20 are often fairly odd-looking). The concern for me is more that these behaviours mean that Voynich word dictionaries (and indeed all word analyses) based on line-initial words are unreliable.

So, what is going on in Q20 (in particular) that is making LIBNPIG words prefer d- / s- / y- so much? I guess this really is the starting point of the paper I’m suggesting here.

Vertical keys?

The notion that the first column of glyphs might have some kind of special meaning is far from new. In fact, there is evidence suggesting this in the manuscript itself on page f66r, where you can clearly see a column of glyphs (though admittedly there is also a column of freestanding words to its left). This is a curious item to find in a manuscript.

But might all (or, at least, many) pages of Voynichese text contain vertical keys inserted as a single line-initial glyph at the start of lines? Philip Neal speculated about this possibility many years ago, causing me to (occasionally) refer to these as “vertical Neal keys”. A vertical key might conceivably be used for many things, such as inserting an (enciphered) page title, or even a folio number or page number: though it’s easy to argue that the relatively narrow range of glyphs we see appearing here probably rule this out.

In “The Curse of the Voynich” (2006), I speculated instead that a glyph inserted at the start of a line might form part of some kind of transposition cipher. The suggestion there was that a second glyph (say, a k-gallows) might act as a token to use the glyph (or some function of that glyph) inserted at the start of the same line. This would be a fairly simple crypto ‘hack’ that would make codebreakers’ jobs difficult.

There are many other possible accounts one can devise. For example, it’s possible that the first glyph on a non-paragraph-initial might function as a kind of catchword, to link the end of one line with the start of the next. Alternatively, it might be telling the reader how to join the text at the end of the preceding line with the text at the start of the current line. Or it might have some kind of crypto token function (e.g. selecting a dictionary). Or it might be a numbering scheme. Or it might be a marker for some funky line transposition scheme. Or a null. Or… one of a hundred other things (if not more).

If all these speculations seem somewhat ungrounded, it’s almost certainly because the basic groundwork to build a sensible discussion of LIBNPIG behaviour upon hasn’t yet been done. Which is your job. 🙂

LIBNPIG Groundwork

What needs doing? For a start, you’d need to build up a solid statistical comparison of paragraph-initial glyphs and LIBNPIG glyphs, along with paragraph-second glyphs and LSBNPS (line-second-but-not-paragraph-second) glyphs, for paragraph text in each of Herbal A, Herbal B, Q13 and Q20 (I would suggest).

With those results in hand, there are some basic hypotheses you might want to try testing:

  • Is there any statistical correlation between a LIBNPIG glyph and the glyph immediately following it? Oddly, it seems that nobody has yet tried to test this – yet if there isn’t (as visually seems to be the case), then I think it’s safe to say that something is provably wrong with all naive text readings.
  • Is there a correlation between a LIBNPIG glyph and the previous line’s end-glyph?
  • Is there a correlation between a LIBNPIG glyph and the following word’s start glyph?
  • Do paragraph-initial second words behave the same way as LIBNPIG second words?
  • Might LIBNPIG glyphs simply be nulls? Might they be chosen just to look nice? Or do they have some genuinely meaningful content?
  • How does all this work for paragraph text in each of the major sections of the Voynich Manuscript? e.g. Herbal A, Herbal B, Q13, Q20
  • (I’m sure you can devise plenty of your own hypotheses here!)

Ultimately, what we would like to know is what LIBNPIG behaviours tell us about how the start of Voynichese lines have to be parsed – for if there is no statistical correlation between a line-initial glyph and the glyph following it, this cannot be a language behaviour.

Even though we can all see numerous LAAFU behaviours, it seems that few Voynich researchers have yet accepted them solidly enough to affect the way they actually think about Voynichese. But perhaps it is time that this changed: and perhaps LIBNPIG will be the thing that causes them to change how they think.

Here’s a second paper suggestion for the virtual Voynich conference being held later this year: this focuses on creatively visualising the differences between Currier A and Currier B.

A vs B, what?

“Currier A” and “Currier B” are the names Voynich researchers use to denote the main two categories of Voynichese text, in honour of Prescott Currier, the WWII American codebreaker who first made the distinction between the two visible in the 1970s.

Currier himself called the two types of Voynichese “A” and “B”, and described them as “languages”, even though he was aware some people might well misinterpret the term. (Spoiler alert: yes, many people did.) He didn’t do this with a specific theory about the manuscript’s text: it’s essentially an observation that the text on different pages work in very different ways.

Crucially, he identified a series of Voynich glyph groupings that appeared in one “language” but not the other: thanks to the availability of transcriptions, further research in the half century since has identified numerous other patterns and textual behaviours that Currier himself would agree are A/B “tells”.

Interesting vs Insightful

But… this is kind of missing the point of what Voynich researchers should be trying to do. The observation that A and B differ is certainly interesting, but it’s not really insightful: by which I mean the fact that there is a difference doesn’t cast much of a light on what kind of difference that difference is.

For example, if A and B are (say) dialects of the same underlying language (as many people simply believe without proof – though to be fair, the two do share many, many features), then we should really be able to find a way to map between the two. Yet when I tried to do this, I had no obvious luck.

Similarly, if A and B are expressions of entirely different (plaintext) languages, the two should really not have so many glyph structures in common. Yet they plainly do.

Complicating things further is the fact that A and B themselves are simplications of a much more nuanced position. Rene Zandbergen has suggested that there seem to be a number of intermediate stages between “pure” A and “pure” B, which has been taken by some as evidence that the Voynich writing system “evolved” over time. Glen Claston (Tim Rayhel) was adamant that he could largely reconstruct the order of the pages based on the development of the writing system (basically, as it morphed from A to B).

Others have suggested yet more nuanced accounts: for example, I proposed in “The Curse of the Voynich” (2006) that part of the Voynichese writing system might well use a “verbose cipher” mechanism, where groups of glyphs (such as EVA ol / or / al / or / aiin / qo / ee / eee / etc) encipher single letters in the plaintext. This would imply that many of the glyph structures shared between A & B are simply artifacts of what cryptologists call the “covertext”: and hence if we want to look at the differences between A and B in a meaningful way, we would have to specifically look beneath the covertext – something which I suspect few Voynich researchers have traditionally done.

Types of Account

As a result, the A/B division sits atop many types of account for the nature of what A and B share, e.g.

  • a shared language
  • a shared linguistic heritage
  • a shared verbose cipher, etc

It also rest upon many different accounts of what A and B ultimately are, e.g.:

  • two related lost / private languages
  • a single evolving orthography wrapped around a lost / private language
  • a single evolving language
  • a single evolving shorthand / cipher system, etc

The difficulty with all of these accounts is that they are often held more for ideological or quasi-religious reasons (i.e. as points of faith, or as assumed start-points) than as “strong hypotheses weakly held”. The uncomfortable truth is that, as far as I know, nobody has yet tried to map out the chains of logical argumentation that move forwards from observational evidence / data to these accounts. Researchers almost always move in the reverse direction, i.e. from account to the evidence, rather than from evidence to explanation.

And when the primary mode of debate is arguing backwards, nobody normally gets anywhere. This seems to be a long-standing difficulty with cipher mysteries (particularly when treasure hunters get involved).

EVA as a Research Template

If Voynich researchers are so heavily invested in a given type of account (e.g. Baxian linguistic accounts, autocopying accounts, etc), how can we ever make progress? Fortunately, we do have a workable template in the success of EVA.

The problem researchers faced was that, historically, different transcriptions of the Voynich were built on very specific readings of Voynichese: the transcriber’s assumptions about how Voynichese worked became necessarily embedded in their transcription. If you were then trying to work with that transcription but disagreed with the transcriber’s assumptions, it would be very frustrating indeed.

EVA was instead designed as a stroke-based alphabet, to try to capture what was on the page without first imposing a heavy-duty model of how it ought to work on top of it. Though EVA too had problems (some more annoying than others), it provided a great way for researchers to collaborate about Voynichese despite their ideological differences about how the Voynichese strokes should be parsed.

With the A/B division, the key component that seems to be missing is a collaborative way of talking about the functional differences between A and B. And so I think the challenge boils down to this: how can we talk about the functional differences between Currier A and Currier B while remaining account-neutral?

Visualising the Differences

To my mind, the primary thing that seems to be missing is a way of visualising the functional differences between A and B. Various types of visualisation strategies suggest themselves:

  • Contact tables (e.g. which glyph follows which other glyph), both for normal parsing styles and for verbose parsing groupings – this is a centuries-old codebreaking hack
  • Model dramatisation (e.g. internal word structure model diagrams, showing the transition probabilities between parsed glyphs or parsed groups of glyphs)
  • Category dramatisation (e.g. highlighting text according to its “A-ness” or its “B-ness”)

My suspicion has long been that ‘raw’ glyph contact tables will probably not prove very helpful: this is because these would not show any difference between “qo-” contacts and “o-” contacts (because they both seem like “o-” to contact tables). So even if you don’t “buy in” to a full-on verbose cipher layer, I expect you would need some kind of glyph pre-grouping for contact tables to not get lost in the noise.

You can use whatever visualisation strategies / techniques you like: but bear in mind the kind of things we would collectively like to take away from this visualisation:

  • How can someone who doesn’t grasp all the nuances of Voynichese ‘get’ A-ness and B-ness?
  • How do A-ness and B-ness “flow” into each other / evolve?
  • Are there sections of B that are still basically A?
  • How similar are “common section A” pages to “common section B” pages?
  • Is there any relationship between A-ness / B-ness and the different scribal hands? etc

Problems to Overcome

There are a number of technical hurdles that need jumping over before you can design a proper analysis:

  • Possibilism
  • Normalising A vs B
  • First glyphs on lines
  • Working with spaces
  • Corpus choice

Historically, too much argumentation has gone into “possibilism”, i.e. considering a glyph pattern to be “shared” because it appears at least once in both A and B: but if a given pattern occurs (say) ten times more often in B than A, then the fact that it appears at all in A would be particularly weak evidence that it is sharing the same thing in both A and B. In fact, I’m sure that there are plenty of statistical disparities between A and B to work with: so it would be unwise to limit any study purely to features that appear in one but not the other.

There is also a problem with normalising A text with B text. Even though there seems to be a significant band of common ground between the two, a small number of high-frequency common words might be distorting the overall statistics, e.g. EVA daiin / chol / chor in A pages and EVA qokey / qokeey / qol in B pages. I suspect that these (or groups similar to them) would need to be removed (or their effect reduced) in order to normalise the two sets of statistics to better identify their common ground.

Note that I am deeply suspicious of statistics that rely on the first glyph of each line. For example, even though EVA daiin appears in both A and B pages, there are some B pages where it appears primarily as the first word on different lines (e.g. f103v, f108v, f113v, all in Q20). So I think there is good reason to suspect that the first letter of all lines is (in some not-yet-properly-defined way) unreliable and should not be used to contribute to overall statistics. (Dealing properly with that would require a paper on its own… to be covered in a separate post).

Working with spaces (specifically half-spaces) is a problem: because of ambiguities in the text (which may be deliberate, from scribal arbitrariness, from transcriber arbitrariness, etc), Voynich transcription is far from an exact science. My suggested mitigation would be to avoid working with sections that have uncertain spacing and labels.

Finally: because of labelese, astro labels and pharma labels, corpus choice is also problematic. Personally, I would recommend limiting analysis of A pages to Herbal A only, and B pages to Q13 and Q20 (and preferably keeping those separate). There is probably as much to be learnt from analysing the differences between Q13’s B text and Q20’s B text as from the net differences between A and B.

If you hadn’t already heard, a Voynich Manuscript-themed virtual conference has recently been announced for 30th November to 1st December 2022: and its organisers have put out a call for papers.

Me, I have at least twenty ideas for topics, all of which I think could/should/would move the state of research forward. But my plan is actually to write up as many of them as I can in posts here, and let people freely take them to develop as their own, or (my preference) to form impromptu collaborations (via the comments section here, or via a thread on voynich.ninja, whatever works for you) to jointly pitch to the organisers.

I’ll start with what I think is the most obvious topic: DNA gathering analysis. I’ll explain how this works…

Quires vs Gatherings

Though some people like to oppose it, by 2022 Voynich researchers really should have fully accepted the idea that many of the Voynich’s bifolios have, over the centuries, ended up in a different nesting/facing order to their original nesting/facing order. There is so much supporting evidence that points towards this, not least of which is the arbitrary & confused interleaving of Herbal A and Herbal B bifolios.

Consequently, there is essentially zero doubt that the Voynich Manuscript is not in its original ‘alpha’ state. Moreover, good codicological evidence suggests that the original alpha state was not (bound) quires but instead (unbound) gatherings, because the quire numbering seems to have been added after an intermediate shuffling stage.

The big codicological challenge, then, is to work out how bifolios were originally grouped together (into gatherings), and how bifolios within each gathering were nested – i.e. the original ‘alpha’ state of the Voynich Manuscript.

Yet without being able to decrypt its text, we have only secondary clues to work with, such as tiny (and often contested) contact transfers. And because many of the (heavy) paint contact transfers (such as the heavy blue colour) seem to have happened much later in the manuscript’s lifetime, many of the contact transfers probably don’t tell us anything about the original state of the manuscript.

In Chapter 4 “Jumbled Jigsaws” (pp.51-71) of my (2006) book “The Curse of the Voynich”, I did my best to use a whole range of types of clue to reconstruct parts of the original folio nesting/facing order. Even so, this was always an uphill struggle, simply because we collectively had no properly solid physical forensic evidence to move this forward in what you might consider a systematic way.

From Gatherings to Vellum Sheets

However, a completely different way of looking at a manuscript is purely in terms of its material production: how were the pages in a gathering made up?

If a vellum manuscript is not a palimpsest (i.e. using previously-used vellum that has been scraped clean), it would typically have started as a large vellum sheet, which would then have been folded down and cut with a knife or shears or early scissors into the desired form. Given the unusual foldout super-wide folios we see in the Voynich Manuscript, I suspect there is almost no chance that these sheets were pre-cut.

As such, the normal process (e.g. for book-like sections) would have been to fold a sheet in half, then in half again, and then cut along the edges (leaving the gutter fold edge intact) to form a small eight-page gathering. This is almost certainly what happened when the Voynich Manuscript was made, i.e. it was built up over time using a series of eight-page gatherings, each from a single sheet.

It’s also important to remember that vellum was never cheap (and it took most of the fifteenth century for the price of paper to become anything less than a luxury item too). Hence even larger fold-out sheets would have not been immune from this financial pressure: so where possible, what remained of a vellum sheet after a foldout had been removed would typically have had to have been used as a bifolio.

The reason this is important is that where bifolios of a gathering were formed from a single sheet of vellum, they would all necessarily share the same DNA. And so this is where the science-y bit comes in.

Enter the DNA Dragon

Essentially, if you can take a DNA swab (and who in the world hasn’t now done this?) of each of the Voynich Manuscript’s bifolios, you should be able to match them together. There is then a very high probability that these matches would – in almost all cases – tell you what the original gatherings were.

The collection procedure appears – from this 2017 New Scientist article – to be painfully simple: identify the least handled (and text-free and paint-free) parts of each bifolio, and use a rubber eraser to take a small amount of DNA from the surface. Other researchers (most famously Timothy Stinson) are trying to build up horizontal macro-collections of medieval vellum DNA: but because the Voynich Manuscript is not (yet) readable, a micro-collection of the DNA in its bifolios would offer a very different analytical ‘turn’.

Though DNA has famously been used for many types of forensic analysis (there are entire television channels devoted to this), determining the original gathering order of an enciphered manuscript is not yet – as far as I know – one of them. But it could be!

Finally: once the gatherings have been matched, close examination (typically microscopic) to determine the hair / flesh side of each bifolio should help further reduce the possible number of facing permutations within each gathering. Remember, the normal practice throughout the history of vellum was that a folded gathering or quire will almost always end up in a flesh-facing-flesh and skin-facing-skin state.

Why is this Important?

As far as understanding the codicology of an otherwise unreadable document goes, DNA gathering matching would be hugely important: it would give clarity on the construction sequence of every single section of the Voynich Manuscript. This, in turn, would cast a revealing light on contentious issues of document construction and sectioning that have bedeviled researchers for years.

This would include not only the relationship of Herbal A bifolios to Herbal B bifolios (a debate going at least back to Prescott Currier), but also the more modern debates about Q13A vs Q13B, Q20A vs Q20B, and the relationship between Herbal A and the various Pharma A pages.

The biggest winners from reconstructing the manuscript’s alpha state would be researchers looking to find meaning and structure in the text. As it is, they’re trying to infer patterns from a document that appears to have been arbitrarily shuffled multiple times in its history. Along these lines, there’s a chance we might be able to use this to uncover a block-level match between a section and an external (unencrypted) text, which is something I have long proposed as a possible way in to the cipher system.

There is also a strong likelihood that folio numbers might well be encrypted (e.g. in the top line of text) – historically, many complicated cipher systems have been decrypted by first identifying their underlying number system, so this too is an entirely possible direct outcome of this kind of research. It would additionally make sense for anyone trying to understand the different scribal hands to be able to situate those contributions relative to the manuscript’s alpha state rather than to its final (omega) state.

In those few sections where we have already been able to reconstruct the manuscript’s alpha state (e.g. Q9), we have uncovered additional symmetries and patterns that were not obviously visible in the shuffled state. Imagine how much more we would be able to uncover if we could reconstruct the alpha state of the entire manuscript!

So… Why Haven’t You Done This Already, Nick?

I’ve been trying for years, really I have. And through that time this basic proposal has received a ton of negativity and push-back from otherwise smart people (who I think really should have known better).

But the times they are (always) a-changing, so maybe it’s now the right time for someone else completely to try knocking at broadly this same door. And if they do, perhaps they’ll find it already open and waiting for them. A moment’s thought should highlight that there’s certainly a great deal – in fact, an almost uniquely large amount – of new, basic stuff to be learnt about the Voynich Manuscript’s construction here.

Yet at the same I would caution that if you look at the list of proposed topic areas for the conference, this kind of physical analysis doesn’t really fit the organisers’ submission model at all. After first submitting a 1-2-page abstract by 30th June 2022, allowing only five weeks after acceptance (20th July 2022) to write a 5-9 page paper seems a bit hasty and superficial, as if the organisers aren’t actually expecting anybody to submit anything particularly worthwhile. But perhaps they have their specific reasons, what do I know?

(But then again, maybe you’d be best off phoning your aunt who works at the History Channel and get an in with a TV documentary-making company. If film-makers can squeeze nine series out of “The Curse of Oak Island”, you’d have thought they’d be all over this like a rash, right? Right?)

If Voynichese isn’t meaningless (and good luck to those who believe it is, that’s a fight you’ll have to fight without me), what language(s) is/are its plaintext written in?

Thinking about this recently, what struck me was how unsystematic (and unsatisfactory) most Voynich language presentations are. For example, discussions of Currier A and Currier B (the two major Voynichese language ‘styles’) typically seem to start too far along, by assuming what the relationship between A and B is before they even begin. So… how about we discuss what that relationship is, and what evidence we have?

Big questions about Currier A and Currier B

The specific differences between Currier A and B form a topic I’ve gone over many times, such as in this 2013 post and more recently in this 2019 post. And the idea that somehow the A ‘system’ evolved into/from the B ‘system’ is something that many researchers have discussed, e.g. Tim Rayhel [Glen Claston] had very strong views on this. Similarly, Rene Zandbergen has perhaps worked hardest to establish that there’s more of a technical spectrum between A and B. Rene has also noted that in some ways B seems to be a more verbose version of A: yet at the same time it is abundantly true that the two also behave in sharply different ways.

So I thought it might help to ask the most important questions about A and B in a more systematic way:

  • Did A precede B, or did B precede A?
  • Are A and B encoding/enciphering two different plaintext languages, or a single plaintext language?
  • Do A pages exhibit internal evolution? If so, can we order A pages according to that evolution?
  • Do B pages exhibit internal evolution? If so, can we order B pages according to that evolution?
  • Might the differences between groups of A pages simply be down to their different topics / contents?
  • Might the differences between groups of B pages simply be down to their different topics / contents?
  • Even though Q13 is Currier B, do language differences separate Q13A pages from Q13B pages?
  • Even though Q20 is Currier B, do language differences separate Q20A pages from Q20B pages?
  • If A and B encipher different languages, was the enciphering system designed primarily for A or for B?
  • If A and B encipher a single language, are all the differences just down to scribal choice?
  • In A and B pages, is there any way to tell whether or not the first letter of a line is real or fake?

To try to explore these difficult (yet fundamental) questions, I’ll now look at a couple of specific behaviours that sharply differ between A and B, to see what those differences seem to tell us about these questions.

The two different daiin behaviours

If you pick out a normal-looking A page (say, f21v, which has a small amount of text accompanying a herbal drawing), you’ll see not only lots of “daiin” instances (six on f21v, two of which are a “daiin daiin” adjacent pair), but also odaiin, chodaiin, todaiin, cholchaiin, sheaiin and kchochaiin. These -aiin instances are located all over the page, as you would expect of words in a normal text.

But if you then go to a normal-looking B page (say, f103v, which is far more text-heavy than f21v) we see eight instances of daiin, six of which are on the left-hand edge (and none of which is on the first line of a paragraph).

Personally, I find these two different behaviours (one text-like, the other LAAFU-like) very hard to reconcile with the oft-floated idea that A and B are two sides of a single coin. This B-behaviour seems to imply that “aiin” (which, as Currier pointed out, is a common B word) is being modified with a “d-” line-initial prefix on B pages, thus making “daiin” an even rarer word in B pages than it might at first appear.

Or maybe there’s some other exotic LAAFU explanation I haven’t yet grasped here. (But I don’t think so.)

The two different -ed- behaviours

Rene Zandbergen’s observation that -ed- is rare in A pages (particularly Herbal A pages) but extremely common in B pages is also very hard to square with the idea that A and B are basically the same thing. I’d certainly agree that in early Herbal A pages, the two instances in the Takahashi transcription (f8r and f11r) both seem like scribal errors in the original rather than systematic -ed- examples.

Things get a little more complicated as you look further in to other A pages: f27v, f51r and f52r look like they have genuine -ed- instances (though the one on f56r looks like to me a scribal slip), while f65v has four -ed- instances. The astronomy section (A) has many more -ed- instances, as does the zodiac section (A), though the pharma section (A) is closer to the density of the Herbal A section.

So, if you were to use the ed-density to try to trace out the evolution of the A pages, I suspect you’d probably conclude that the order they were constructed in was: Herbal A, Pharma A, Astro A, Zodiac A. And then you’d probably conclude that the B pages (which have extraordinarily heavy ed-density throughout) were written after the A pages.

Evolution of a system

To my eyes, the changing way that -ed- appears in the A pages suggests that what we are glimpsing here is the evolution of a system, where new features are gradually introduced and diffused into practice. I further believe that this also implies the A pages were constructed before the B pages. Yet the huge step change in -ed- usage between A and B pages suggests to me that something quite different is going on in B pages.

Similarly, the vastly different ways that daiin appears in A and B pages (position-independent in A, position-dependent in B) also suggests to me that something very different is going on in B pages.

So, what is going on in B pages? Though this margin is far too small for me to come to a definitive conclusion, it currently seems to me to be in some way a combination of things. While the system itself definitely seems to have step-changed from A to B (which I think the daiin A/B behaviour argues for), I can’t yet rule out the possibility that this change in system may well have been driven by a change in plaintext language in B pages.

If you know of any Voynichese behaviours that you think help to illuminate, illustrate, or answer any questions on the list above, please leave a comment below, thanks!