Seeing the Voynich Manuscript for the first time is quite an intimidating experience: you’re looking at something which is so uncertain in so many different ways – how should you try to “read” it?

In general, when you look at a page of text, you do two different types of reading: (1) you work out how everything is laid out (you navigate the page) and (2) you read what is contained within it (you read the text). In computer science terms, you could describe the layout conventions and text conventions as having two quite separate ‘grammars’.

For instance, if you picked up a Hungarian newspaper, I would predict that you would stand a good chance of being able to work out its structure, even though you may not be able to understand a single word. It’s perfectly reasonable, then, to be able to navigate a page without being able to read it.

What’s not widely known about the Voynich Manuscript is that researchers have identified many of the navigational elements that structure the text (even though they cannot actually read them). I thought it might be helpful to post about these (oh, and I’m getting emails mildly berated me for posting too much about the wrong ‘v’, i.e. that it’s not “Vampire News).

As a practical example, let’s look at the very first page of the manuscript proper: this has the name “f1r” (which means “the recto [front] side of folio [double-sided page] #1″). You may also see this referred to as “f001r” (some people use this naming style so that their image files get sorted nicely), or even as “1006076.sid” (this is the Beinecke Rare Book & Manuscript Library’s internal database reference for the high-resolution scan of f1r, which they store as a kind of highly compressed image). This is what f1r looks like:-

Note that the green splodges aren’t actually part of the page itself – they’re green leaves painted onto the reverse side of the folio (that is, on f1v, “folio #1 verso [back]) that happen to be visible through the vellum. I’ll leave the issue of whether this is because the paint is too thick or the vellum is too thin to another day…

If we use a tricky colour filter written by Jon Grove (more on it here), we can make a passable attempt at removing the green splodges: and if we then bump up the contrast to make everything a little clearer, we can get a revised image of f1r:-


Red areas: these form the first four paragraphs of the text. These often start with one of four large vertical characters (known as “gallows characters”), and appear to have been written from top-left down to bottom-right, as you would English, French, Latin etc.

Blue areas: these are known as “titles”, and are typically right-aligned words or short phrases added to the end of paragraphs. It has been proposed that the text contained in these might actually be section titles (which seems fairly reasonable). There’s a brief discussion on this by (a differently spelled!) John Grove here, who first suggested the term.

Yellow area: this is a cipher key arranged vertically down the right hand side of the page that someone has written in (and only partially filled before giving up) in a 16th century hand. Though a bit indistinct, you can still make out “a b c d e” at the top left and a few other letters besides.

Bright green areas: these odd shapes appear nowhere else, and are generally referred to as “weirdoes” (for want of a better name). Interestingly, these are picked out in bright red: f67r2 is the only other place with red text that I can think of (the page that was originally on the front of what is now Quire 9).

Dull green area: this is where the earliest proven owner wrote his signature (something like “Jacobus de Tepenecz, Prag”, though it is very hard to make out), which a subsequent owner appears to have (quite literally) scrubbed off the page (if you look carefully, you can see what appears to be two or more watermarks at the edges of the area). The question of why someone would want to do this is a matter for another day…

Pink area: hidden in the top right corner next to some wormholes and the folio number (“1”, in a sixteenth century hand) is a very faint picture, possibly of a bird. Surprisingly, this subtle piece of marginalia doesn’t appear in GC’s otherwise-very-good gallery of Voynich marginalia: so here’s an enhanced picture of it so you can see what I’m talking about:-.

So, even if we can’t yet read f1r’s text, can we navigate its layout? I believe we can! From the presence of red text, I’m fairly certain it was the first page of a quire: and from the signature and weathering, I don’t see any reason to think this was ever bound anywhere apart from at the front of the manuscript. This leads me to predict that the set of four paragraphs forms an index to the manuscript as a whole, and so very probably describe four separate “books” or “works”, where the “title” (appended to the end of the paragraph) is indeed the title of that book.

If you were looking for cribs to crack the titles 🙂 , my best guess is that the first book (section) is a herbal, the second book is on the stars (astronomy and astrology), the third book is on water, while the fourth book comprises recipes and secrets. I also suspect that this index page was composed about three-quarters of the way through the project, and that the (really quite strange) Herbal-B pages were added in a subsequent phase. But, once again, that’s another story entirely…

I’ve spent a long time (though “far too long” probably covers it better) hunting down obscured fragments of text in the Voynich Manuscript: so my Spidey-sense tingled almost uncontrollably when I saw a claim for hidden text on f1v in the “Marginal Writing” picture gallery on Glen Claston’s Voynich Central.

I’d never heard of this before: just to be sure, I checked Reuben Ogburn’s 2004 page on “Writing in and around plant illustrations” in case it had slipped in there, but no sign. If you run this through Jon Grove’s colour separator filter, you can see that the brown ink used for the drawing and the brown paint used to fill it in are very slightly different: in the image below, the white area is where the filter thinks the overpainting happened.

But is there writing beneath? If you squint at the topmost image here long enough, you can start to make out something that might almost be writing. But if you filter it slightly differently, I think the answer emerges: the “signal” (below) appears to be not writing, but only compression artefacts from the MrSID wavelet encoding. Sorry, guys: false alarm! (Though next time I’m at the Beinecke, I’ll have another quick look, just to be completely sure…)

Back in 1991, sardonic linguist Jacques Guy concocted a deliberately false theory about the Voynich, “to demonstrate how the absurd can be dressed in sensible garb“. His “Chinese Hypothesis” had Marco Polo bringing back two Chinese scholars to Venice, who wrote down their encyclopaedic knowledge into a book in some semi-improvised European script… you guessed it, Voynichese. He never believed his pet canard for a moment: it was a rhetorical gesture to the interpretative folly – which I call “the curse” – that surrounds the study of the manuscript.

But then in 1997, Brazilian computer science professor Jorge Stolfi pointed out that, actually, Voynichese as transcribed does share a lot of statistical properties with Mandarin Chinese texts. Though technically true, the problem is not its stats, but rather that the Voynich Manuscript is (with very little doubt) a fifteenth century European cultural artefact. Stats only indicate correlation, not causation: so all Stolfi’s results really say is that the Voynich Manuscript transcription correlates moderately well with certain Mandarin Chinese transcriptions. But lifting the abstracted text out of its codicological and stylistical contexts can easily give rise to the kind of plucking fallacy Gordon Rugg’s work suffers from. Is the statistical similarity Stolfi found in the texts themselves, or in the methodology used to design the two transcriptions? I suspect it may well be the latter: the map is not the territory.

So why am I so fascinated by the news that some indecipherable Chinese texts have recently been found? They don’t look anything like Voynichese (and why should they?): but they do look like a pictographic script not entirely dissimilar to Chinese. Their finder, 38-year-old Zhou Yongle, suspects they might be written by the Tujia, a large ethnic minority in mainland China which has a spoken language but (as far as anyone knew) no written one. For what it’s worth, Wikipedia asserts that Tujia is a Tibeto-Burman language with some similarities to Yi: but – come on – you’d have to be a pretty h4rdc0re linguist to know or care what that means.

No: what I find intriguing is that these texts do look precisely like the kind of cultural artefacts you would expect, with (real) Chinese annotations and marginalia. If Jacques wants a proper historical linguistic puzzle to get his teeth into, then this would surely be exactly the right kind of thing for him: honestly, where’s the fun in devising a Sokal-like hoax at self-mystificating Voynichologists, when they’re already more than capable of tying themselves in knots over essentially nothing?

Of course, we mustn’t forget the possibility that Zhou Yongle may (for whatever reason) have faked these unreadable documents. You may not have heard of the huge “paper tiger” scandal in China recently over photos of the South Chinese Tiger, believed to have been faked by hunter Zhou Zhenglong; or indeed the whole issue of the 1421 (1418/1763) map hoaxery, as ably deconstructed by Geoff Wade et al. Were all three simply ‘Made In China’? It’s a good question…

It’s a nice historical detective story, one kicked off by John Dee, Frances Yates‘ favourite Elizabethan ‘magus’ (though I personally suspect Dee’s ‘magic’ was probably less ‘magickal’ than it might appear), when he claimed to have told an angel that his “great and long desyre hath byn to be hable to read those tables of Soyga“. Dee lost his precious copy of the “Book of Soyga” (but then managed to find it again): when subsequently Elias Ashmole owned it, he noted that its incipit (starting words) was “Aldaraia sive Soyga vocor…“.

However, since Ashmole’s day it was thought to have joined the serried, densely-stacked ranks of long-disappeared books and manuscripts, in the “blue-tinted gloom” of some mythical, subterranean library not unlike the “Cemetery of Lost Books” in Carlos Ruiz Zafon’s novel “The Shadow of the Wind” (2004)…

Fast-forward 400 years to 1994, and what do you know? Just like rush hour buses, two copies of the “Book of Soyga” turn up at once, both found by Deborah Harkness. Rather than searching for “Soyga“, she searched for its “Aldaraia…” incipit: which is, of course, what you were supposed to do (in the bad old days before the Internet).

It is a strange, transitional document, neither properly medieval (the text has few references to authority) nor properly Renaissance. There are some mysterious books referenced, such as the Liber Sipal and the Liber Munob: readers of my book “The Curse of the Voynich” may recognize these as simple back-to-front anagrams (Sipal = Lapis [stone], Munob = Bonum [Good], Retap Retson = Pater Noster [our Father]). In fact, Soyga itself is Agyos [saint] backwards.

But what was the secret hidden behind the 36 mysterious “tables of Soyga” that had vexed John Dee so? 36×36 square grids filled with oddly patterned letters, they look like some kind of unknown cryptographic structure. Might they hold a big secret, or might they (like many of Trithemius’ concealed texts) just be nonsense, a succession of quick brown foxes endlessly jumping over lazy dogs?

  • oyoyoyoyoyoyoyoyoyoyoyoyoyoyoyoyoyoy
    rkfaqtyoyoyoyoyoyoyoyoyoyoyoyoyoyoyo
    rxxqnkoyoyoyoyoyoyoyoyoyoyoyoyoyoyoy
    azzsxbqtyoyoyoyoyoyoyoyoyoyoyoyoyoyo
    sheimasddtguoyoyoyoyoyoyoyoyoyoyoyoy
    eyuaoiismspkfaqtyoyoyoyoyoyoyoyoyoyo
    enlxflfudzrxxqnkoyoyoyoyoyoyoyoyoyoy
    sxcahqczfbtfzsxbqtyoyoyoyoyoyoyoyoyo
    azepxhheurgmyknqnkoyoyoyoyoyoyoyoyoy
    rlbriyzycuyddpotxbqtyoyoyoyoyoyoyoyo
    ryrezabirhdiszeknqnkoyoyoyoyoyoyoyoy
    ogzgfceztqalpntsxhssyoyoyoyoyoyoyoyo
    opnxxsnodxqhuekknykkoyoyoyoyoyoyoyoy
    rcqsfueesfsqrqgqrossyoyoyoyoyoyoyoyo
    roauxmdkkxkhyhmpzqphdtgtguoyoyoyoyoy
    aqxmudiamubkoqifbszktdmspkfaqtyoyoyo
    sazoesrmlrnaqnzhgabmsmlpeahfsddtguoy
    ………………………………
    (etc)

Jim Reeds, one of the great historical code-breakers of modern times, stepped forward unto the breach to see what he could make of these strange tables: he transcribed them, ran a few tests, and (thank heavens) worked out the three-stage algorithm with which they were generated.

Stage 1: fill in the 36-high left-hand column (which I’ve highlighted in blue above) with a six-letter codeword (such as ‘orrase‘ for table #5, ‘Leo’) followed by its reverse anagram (‘esarro‘), and then repeat them both two more times

Stage 2: fill each of the 35 remaining elements in the top line in turn with ((W + f(W)) modulo 23), where W = the element to the West, ie the preceding element. The basic letter numbering is straightforward (a = 1, b = 2, c = 3, … u = 20, x = 21, y = 22, and z = 23), but the funny f(W) function is a bit arbitrary and strange:-

  • x f(x) x f(x) x f(x) x f(x)
    a…2, g…6, n..14, t…8
    b…2, h…5, o…8, u..15
    c…3, i..14, p..13, x..15
    d…5, k..15, q..20, y..15
    e..14, l..20, r..11, z…2
    f…2, m..22, s…8

Stage 3: fill each row in turn with ((N + f(W)) modulo 23), where N = the element to the North, ie the element above the current element.

For example, if you try Stage 2 out on ‘o’, (W + f(W)) modulo 23 = (14 + 8) modulo 23 = 22 = ‘y’, while (22 + 15) modulo 23 = 14 = ‘o’, which is why you get all the “yoyo”s in the table above.

And there (bar the inevitable miscalculations of something so darn fiddly, as well as all the inevitable scribal copying mistakes) you have it: the information in the Soyga tables is no more than the repeated left-hand column keyword, plus a rather wonky algorithm.

You can read Jim Reeds paper here: a full version (with diagrams) appeared in the pricy (but interesting) book John Dee: Interdisciplinary essays in English Renaissance Thought (2006). The End.

Except… where exactly did that funny f(x) table come from? Was that just, errrm, magicked out of the air? Jim Reeds never comments, never remarks, never speculates: effectively, he just says ‘here it is, this is how it is‘. But perhaps this f(x) sequence is in itself some kind of monoalphabetic or offseting cipher to hide the originator’s name: Jim is bound to have thought of this, so let’s look at it ourselves:-

  • 1.2.3.4..5.6.7.8..9.10.11.12.13.14.15.16.17.18.19.20.21.22.23
    2.2.3.5.14.2.6.5.14.15.20.22.14..8.13.20.11..8..8.15.15.15..2

If we discount the “2 2” at the start and the “8 8 15 15 15 2” at the end as probable padding, we can see that “14” appears three times, and “5 14” twice. Hmm: might “14” be a vowel?

  • 2 3 5 14 2 6 5 14 15 20 22 14 8 13 20 11 8
  • a b d n a e d n o t x n g m t k g
  • b c e o b f e o p u y o h n u l h
  • c d f p c g f p q x z p i o x m i
  • d e g q d h g q r y a q k p y n k
  • e f h r e i h r s z b r l q z o l
  • f g i s f k i s t a c s m r a p m
  • g h k t g l k t u b d t n s b q n
  • h i l u h m l u x c e u o t c r o
  • i k m x i n m x y d f x p u d s p
  • k l n y k o n y z e g y q x e t q
  • l m o z l p o z a f h z r y f u r
  • m n p a m q p a b g i a s z g x s
  • n o q b n r q b c h k b t a h y t
  • o p r c o s r c d i l c u b i z u
  • p q s d p t s d e k m d x c k a x
  • q r t e q u t e f l n e y d l b y
  • r s u f r x u f g m o f z e m c z
  • s t x g s y x g h n p g a f n d a
  • t u y h t z y h i o q h b g o e b
  • u x z i u a z i k p r i c h p f c
  • x y a k x b a k l q s k d i q g d
  • y z b l y c b l m r t l e k r h e
  • z a c m z d c m n s u m f l s i f

Nope, sorry: the only word-like entities here are “tondean”, “catsik”, and “zikprich”, none of which look particularly promising. This looks like a dead end… unless you happen to know better? 😉

A final note. Jim remarks that one of the manuscripts has apparently been proofread, with “f[letter]” marks (ie fa, fb, fc, etc); and surmises that the “f” stands for “falso” (meaning false), with the second letter the suggested correction. What is interesting (and may not have been noted before) is that in the Voynich Manuscript, there’s a piece of marginalia that follows this same pattern. On f2v, just above the second paragraph (which starts “kchor…”) there’s a “fa” note in a darker ink. Was this a proof-reading mark by the original author (it’s in a different ink, so this is perhaps unlikely): or possibly a comment by a later code-breaker that the word / paragraph somehow seems “falso” or inconsistent? “kchor” appears quite a few times (20 or so), so both attempted explanations seem a bit odd. Something to think about, anyway…

Another Voynich-inspired (I’m yet sure whether or not “Voynich-themed” might be putting it a bit strongly) novel to add to the ever-fattening Big Fat List. Australian writer Matt Rubinstein‘s novel was called “A Little Rain on Thursday” (the picture is from f75r) when it was published last June in Oz by Text Publishing: it appeared here last July (published by Quercus) under the title “Vellum“. Amazon Marketplace has copies for £1.98 + £2.75 UK p&p: I’ve ordered one & will post a review here ASAP. It doesn’t appear to have any evil Jesuit priests in it, which has to be A Very Good Thing Indeed.

What’s sort of appealing (well – to me, at least) is the way he casually slips the words “marginalia” and “forensic” into the cover blurb. However, this may well be a weakness, given that to keep him fed and watered in writerland, his book has to sell to a large number of non-Voynicheros, to whom such things are usually fairly alien (even if they do watch CSI).

Oh, and the stuff in the story about the manuscript decipherer being obsessive may also have alienated him from passing VMs-ologists. We’re not obsessive, I tell you: we count the number of stars on each section of each page for scientific reasons, damnit! Errrrrrrrrrm…

…maybe he’s got a point. Oh well… :-((((