As promised (though a little later than planned), here’s the transcript of the second IM session I ran at the 2009 Voynich Summer Camp in Budapest. Not quite as meaty as the first IM session, but some OK stuff in there all the same. Enjoy!

[11:56:09] NP: Okeydokey, ready when you are
[11:56:18] vc: Okedykokedy
[11:56:27] NP: 🙂
[11:56:35] vc: We are.
[11:56:35] NP: I think that’s on f113r
[11:56:40] vc: …
[11:56:45] NP: 🙂
[11:56:55] NP: So… how has it all gone?
[11:57:12] NP: Tell me what you now think about the VMs that you didn’t before?
[11:57:27] vc: It should be simple.
[11:57:36] vc: The solution should be simple.
[11:57:41] NP: but…
[11:58:07] vc: But …
[11:58:33] vc: The verbose cipher still permits us a lot of possibilities.
[11:58:52] NP: Verbose cipher only gets you halfway there
[11:59:03] NP: But that’s still halfway more than anything else
[11:59:28] vc: We could synthesize a coding which is capable to produce the same statistical properties as the MS
[11:59:48] NP: Yup, that was (basically) Gordon Rugg’s 2004 paper
[11:59:58] vc: simple enough to do manually of course
[12:00:31] NP: The problem is one of duplicating all the local structural rules
[12:00:40] vc: Gordon’s generating gibberish by encoding gibberish
[12:01:06] NP: Basically
[12:01:25] vc: Yes, we suspect that the text contains real information in a natural language.
[12:01:30] vc: We tried this.
[12:02:06] NP: Rugg’s work requires a clever (pseudo-random) daemon to drive his grille thing… but he never specified how someone 500 years ago could generate random numbers (or even conceive of them)
[12:02:07] vc: We tried to encode for example the vulgata with our method
[12:02:10] NP: ok
[12:02:23] NP: into A or B?
[12:02:24] vc: throw dices I guess?
[12:02:26] vc: lol
[12:02:37] NP: only gives you 1-6 random
[12:02:48] vc: 3 dices
[12:02:52] vc: ect
[12:02:52] NP: two dice give you a probability curve
[12:02:56] NP: not flat
[12:03:02] vc: hmm
[12:03:06] vc: roulette wheel
[12:03:11] NP: Anachronistic
[12:03:19] vc: Ok. We use no random.
[12:03:23] NP: 🙂
[12:03:25] vc: our encoder is deterministic
[12:03:33] NP: Good!
[12:03:35] vc: that’s the point
[12:04:28] vc: We suspect that the “user” added some randomness in some of the aspects of the encoding, but this is not overwhelming
[12:04:49] NP: That’s right
[12:05:21] vc: We also picked out the A and B languages
[12:05:23] NP: Though some aspects (like space insertion into ororor-type strings) were more tactical and visual than random
[12:05:27] NP: Good!
[12:05:33] vc: with different methods
[12:05:52] vc: so we basically verified a lot of past results
[12:06:17] NP: Do you have a synthetic A paragraph you can cut and paste here?
[12:06:17] vc: After that, we decided to concentrate on the first 20 pages
[12:06:22] NP: Good!
[12:07:17] vc: for example, A languages uses ey or y at the end of the words, while B language uses edy instead
[12:07:51] vc: Synthetic sample… ok, just a minute
[12:08:29] NP: ey/y vs edy – Mark Perakh pointed this out too, and suggested that it meant B was less contracted than A. It also forms the core of Elias Schwerdtfeger’s “Biological Paradox”
[12:09:25] vc: Our results are largely independent – the guys didn’t know the past results
[12:09:54] NP: That’s ok. 🙂
[12:10:41] vc: nu stom huhoicpeey strifihuicom ristngngpeet pept suhors periet pescet sticpescom ichoey pt om icpeript
[12:11:17] NP: I hope that’s not EVA
[12:11:41] vc: Y, of course not
[12:12:08] vc: not close, but the whole thing started here when some of us tried out a method which produced some non-trivial statistics very similar to VMS
[12:12:43] NP: I’m certainly getting a partially-verbose vibe off this
[12:12:52] vc: the original:
[12:13:17] vc: haec sunt verba que locutus est
[12:13:18] vc: Moses
[12:13:40] NP: Ummm… that’s pretty verbose, then. 🙂
[12:14:04] vc: Again, a deterministic, static automaton.
[12:14:15] NP: Fair enough!
[12:15:09] NP: Sorry for asking a lecturer-style question, 🙂 but how has doing that affected how you look at Voynichese?
[12:16:03] vc: Sec
[12:16:49] vc: discussing 🙂
[12:17:38] vc: it’s a coded natural language text. We suspect that the language is Italian – from measured results.
[12:18:00] vc: That’s why we are very curious about your news!
[12:18:21] NP: Let’s finish your news first!
[12:18:38] vc: ok. Was that an answer for your question?
[12:19:02] NP: Pretty much – would you like to write it up informally to publish on the blog?
[12:19:55] NP: 1000 words should cover it 🙂
[12:21:18] NP: (you don’t need to write it now!)
[12:21:25] vc: We admit that we would like to work on our theory and method a bit before publishing it, because one of the important statistical feature doesn’t match
[12:21:31] vc: yet
[12:21:35] NP: 🙂
[12:21:52] NP: ok
[12:22:06] NP: that’s good
[12:22:23] NP: what else have you been thinking about and discussing during the week?
[12:22:35] NP: VMs-wise, that is 🙂
[12:22:42] vc: 🙂
[12:22:54] vc: haha, you got the point…
[12:23:02] NP: 🙂
[12:23:56] vc: We toyed with the idea that the astrological diagrams are so poorly rendered that they aren’t astrological diagrams. They are coder tools.
[12:24:10] NP: cipher wheels?
[12:24:22] vc: Kind of. Yes.
[12:24:35] NP: (that’s been suggested many times, though never with any rigour)
[12:24:36] vc: we also tried to identify some of the star names.
[12:24:47] NP: No chance at all
[12:25:01] NP: That is a cliff with a huge pile of broken ships beneath it
[12:25:21] NP: sadly
[12:25:27] vc: been there, done that, yes
[12:25:30] NP: 🙂
[12:26:22] vc: We also observed that the takeshi transcription becomes less reliable when the text is rotated or tilted.
[12:26:36] vc: The other places – it is quite good.
[12:26:45] NP: Yes, that’s a fair enough comment
[12:27:08] NP: A complete transcription has been done, but it hasn’t been released – very frustrating
[12:27:25] NP: (by the EVMT people, Gabriel Landini mainly)
[12:27:17] vc: Also we are not contented with some of the EVA transcription’s choices of the alphabet
[12:27:34] NP: the “sh” really sucks
[12:27:39] vc: YES
[12:27:45] NP: 🙁
[12:28:53] NP: Glen Claston’s transcription added stuff in, many people use that instead purely for its better “sh” handling
[12:29:26] vc: hmm, ok
[12:29:53] NP: In a lot of ways, though, who’s to say? A single ambiguous letter shouldn’t really be enough to destroy an entire dcipherment attack
[12:30:04] NP: given that it’s not a pure polyalpha
[12:30:37] vc: of course
[12:30:54] NP: But analyses still don’t seem to get particularly close
[12:31:03] NP: Oh well
[12:31:23] vc: Analyses of whom
[12:31:24] vc: 🙂
[12:31:25] vc: ?
[12:31:29] vc: 😉
[12:31:35] NP: not yours, of course 😉
[12:32:32] NP: is that your week summarized, then?
[12:32:53] vc: Yes.
[12:33:16] NP: has it been fun? worthwhile? frustrating? dull?
[12:33:32] vc: All of them.
[12:33:34] NP: and would you do another next summer?
[12:33:57] vc: No need of it. Maybe with the rohonc codex
[12:34:00] vc: lol, of course
[12:34:13] NP: 🙂
[12:35:06] NP: I’m really pleased for you all – it sounds like you have managed to get a fairly clearheaded view of the VMs out of the whole process, and have had a bit of fun as well
[12:35:51] NP: Most VMs researchers get very tied up to a particular theory or evidence or way of looking at it – you have to keep a broader perspective to make progress here
[12:35:53] vc: let’s say two bits
[12:36:14] NP: “two bits of fun” 🙂
[12:36:21] NP: good

[I then went into a long digression about the “Antonio of Florence”, about which I’ve already posted far too much to the blog… so –SNIP–]

[12:51:50] vc: ooo wait a sec…
[12:52:16] vc: Can we ask Philip Neal to post some some pages of a reference book he uses?
[12:52:42] vc: sorry about the redundancy
[12:53:02] NP: He’s a medieval Latin scholar by training, what kind of thing would you want?
[12:53:39] vc: about the alchemical herbals. Can we manage it later?
[12:53:45] vc: Please go on
[12:53:51] NP: Well.. that’s about it
[12:54:10] NP: Obviously I typed faster than I thought 🙂

[13:00:11] vc: What do you know? How much people is working on a voynich-deciphering automaton based on markov thingies and such?
[13:00:37] vc: So basically with the same hypotheses like ours?
[13:00:57] NP: The problem with markov models is that they will choke on verbose ciphers, where letters are polyvalent
[13:01:08] NP: Nobody in the literature seems to have picked this up
[13:01:24] vc: bad for them
[13:01:50] NP: Unless you pre-tokenize the stream, Markov model finders will just get very confused
[13:02:03] NP: and give you a linguist-friendly CVCV-style model
[13:02:11] NP: that is cryptographically wrong
[13:03:04] NP: perhaps “multi-functional” rather than “polyvalent”, I’m not sure :O
[13:04:23] NP: So, I’m not convinced that anyone who has applied Markov model-style analysis to the VMs has yet got anywhere
[13:04:29] NP: Which is a shame
[13:05:04] NP: But there you go
[13:05:25] vc: We hope.
[13:05:47] NP: 🙂

[13:06:24] NP: Right – I’ve got to go now (sadly)
[13:06:48] NP: I hope I’ve been a positive influence on your week and not too dogmatic
[13:07:09] vc: Why, of course
[13:07:16] NP: And that I’ve helped steer you in generally positive / constructive directions
[13:07:30] vc: Yes, indeed.
[13:07:35] NP: (Because there are plenty of blind alleys to explore)
[13:07:41] NP: (and to avoid)
[13:07:52] vc: VBI…
[13:07:52] vc: 🙂
[13:08:07] NP: Plenty of that to step in, yes
[13:08:14] NP: 🙂
[13:08:24] NP: And I don’t mean puddles
[13:09:42] vc: Well, thank you again for the ideas and the lots of information 🙂
[13:11:18] vc: Unfortunately semester starts in weeks, so we can’t keep working on this project
[13:12:04] vc: but as soon as we earn some results, we will definitely contact you
[13:12:15] NP: Excellent, looking forward to that
[13:12:54] NP: Well, it was very nice to meet you all – please feel free to subscribe to Cipher Mysteries by email or RSS (it’s free) so you can keep up with all the latest happenings.
[13:13:23] vc: ok 🙂
[13:13:57] NP: Best wishes, and see you all for the Rohonc week next summer 🙂
[13:14:04] NP: !!!!!
[13:14:11] vc: lol 🙂
[13:14:21] vc: that’s right! 😉
[13:15:16] NP: Excellent – gotta fly, ciao!
[13:15:36] vc: Best!
[13:15:37] vc: bye

According to this recent Wired article, Rajesh Rao, a computer scientist from the University of Washington, has run a Markov chain finder on the 1500-odd fragments of (the as-yet-undeciphered) Indus script – and has ‘discovered’ that it is “moderately ordered, just like spoken languages“.

Well, ain’t that something.

In a depressingly familiar echo of the ‘hoax’ debate over the Voynich Manuscript, the most important result is that it argues against Steve Farmer’s (2004) case that the Indus fragments were merely “political and religious symbols, i.e. not a language at all, but just odd visual propaganda of some sort.

Language is a tricky, evolving, misunderstood, dynamic artefact that typically only has meaning within a very specific local context. The failure of linguists to “crack” the Indus fragments (all of which are very short) is no failure at all – we are massively disadvantaged by the passing millennia, and cannot easily trace the structure within the flow of ideas (the perennial intellectual historian hammer).

Having said that, what I read as Farmer’s basic idea – that researchers have for too long looked for a definitive script grammar as an indicator of advanced literacy – is an excellent point. And so the notion that Indus script analysts should perhaps be instead looking for some kind of arbitrary / non-formalized explanation (a confused model, rather than a complex one) is sensible. My opinion is that Farmer is overplaying his skeptical hand, and that the script is very probably communication (as opposed to mere decoration) – but is it written in something we would recognize as a language? Apparently not, I would say.

Incidentally, Indus script uses roughly 300-400 symbols (depending on how you count them), with the most frequent four symbols making up about 21% of the texts: inscriptions (many on potsherds, also known as ostraca) are all short, with an average length of only 4.6 symbols. All of which makes the script completely unlike known languages – but all the same, what is it?

Perhaps Rajesh Rao’s Markov models will reveal some kind of pointers towards its hidden structure, towards the truth – but as to Rao’s suggestion that they may well yield a “grammar”… I suspect not.

PS: Farmer cites Gabriel Landini & Rene Zandbergen’s paper (funny, that), though points out that Zipf’s Law is an ineffective tool for differentiating language-based texts from non-language-based texts. Just so you know…

While snooping around the (mostly empty) user subsites on Glen Claston’s Voynich Central, I came across a page by someone called Robin devoted solely to the Scorpio “Scorpion” page in the VMs. This has an unusual drawing of a scorpion (or salamander) at the centre, and which I agree demands closer attention…

Voynich Manuscript f73r, detail of scorpion/salamander at centre of Scorpio zodiac circle

My first observation is that, while the paint in the 8-pointed star is very probably original, the green paint on the animal below is very likely an example of what is known as a “heavy painter” layer, probably added later. But what lies beneath that?

Luckily, there exists a tool for (at least partially) removing colour from pictures, based on a “colour deconvolution” algorithm originally devised (I believe) by Voynich researcher Gabriel Landini, and implemented as a Photoshop plugin by Voynich researcher Jon Grove. And so the first thing I wanted to do was to run Jon’s plugin, which should be simple enough (you’d have thought, anyway).

However… having bought a new PC earlier in the year and lost my (admittedly ancient) Adobe Photoshop installation CD, Photoshop wasn’t an easy option. I also hadn’t yet re-installed Debabelizer Pro, another workhorse batch image processing programme from the beginning of time that I used to thrash to death when writing computer games. If not them, then what?

Well, like many people, I had the Gimp already installed, and so went looking for a <Photoshop .8bf plugin>-loading plugin for that: I found pspi and gimpuserfilter. However, the latter is only for Linux, while the former only handles a subset of .8bf files… apparently not including Jon Grove’s .8bf (I think he used the excellent FilterMeister to write it), because this didn’t work when I tried it.

For a pleasant change, Wikipedia now galloped to the rescue: it’s .8bf page suggested that Helicon Filter – a relatively little-known non-layered graphics app from the Ukraine – happily runs Photoshop plugins. I downloaded the free version, copied Jon Grove’s filter into the Plug-ins subdirectory, and it worked first time. Neat! Well… having said that, Helicon Filter is quite (ready: “very”) idiosyncratic, and does take a bit of getting used to: but once you get the gist, it does do the job well, and is pleasantly swift.

And so (finally!) back to that VMs scorpion. What does lie beneath?

Voynich manuscript f73r detail, but with the green paint removed

And no, I wasn’t particularly expecting to find a bright blue line and a row of six or seven dots along its body either. Let’s use Jon’s plugin to try to remove the blue as well (and why not?):-

Voynich Manuscript f73r central detail, green and blue removed

Well, although this is admittedly not a hugely exact process, it looks to me to be the case that the row of dots was in the original drawing. Several of the other zodiac pictures (Gemini, Leo, Virgo, Libra, Sagittarius) have what appears to be rather ‘raggedy’ blue paint, so it would be consistent if Scorpio had originally had a little bit of blue paint too, later overpainted by the heavy green paint.

And so my best guess is that the original picture was (like the others I listed above) fairly plain with just a light bit of raggedy blue paint added, and with a row of six or seven dots along its body. But what do the dots mean?

I strongly suspect that these dots represent a line of stars in the constellation of Scorpio. Pulling a handy copy of Peter Whitfield’s (1995) “The Mapping of the Heavens” down from my bookshelf, a couple of quick parallels present themselves. Firstly, in the image of Scorpio in Gallucci’s Theatrum Mundi (1588) on p.74 of Whitfield, there’s a nice clear row of six or seven stars. Also, p.44 has a picture of Bede’s “widely-used” De Signis Coeli (MS Laud 644, f.8v), in which Scorpio’s scorpion has 4 stars running in a line down its back: while p.45 has an image from a late Latin version of the Ptolemy’s Almagest (BL Arundel MS 66, circa 1490, f.41) which also has a line of stars running down the scorpion’s back. A Scorpio scorpion copied from a 14th century manuscript by astrologer Andalo di Negro (BL MS Add. 23770, circa 1500, f.17v) similarly has a line of stars running down its spine.

In short, in all the years that we’ve been looking at the iconographic matches for the drawings at the centre of these zodiac diagrams, should we have instead been looking for steganographic matches for constellations of dots hidden in them?

Incidentally, another interesting thing about the Scorpio/Sagittarius folio is that the scribe changed his/her quill halfway through: which lets us reconstruct the order in which the text in those two pages was written.

Firstly, the circular rings of text and the nymphs were drawn for both the Scorpio and Sagittarius pages. The scribe then returned to the Scorpio page, and started adding the nymph labels for the two inner rings, (probably) going clockwise around from the 12 o’clock mark, filling in the labels for both circular rows of nymphs as he/she went. (Mysteriously, the scribe also added breasts to the nymphs during this second run). Then, when the quill was changed at around the 3 o’clock mark, the scribe carried on going, as you can see from the following image:-

Voynich manuscript f73r, label details (just to the right of centre)

What does all this mean? I don’t know for sure: but it’s nice to have even a moderate idea of how these pages were actually constructed, right? For what it’s worth, my guess is that these pages had a scribe #1 writing down the rings and the circular text first, before handing over to a scribe #2 to add the nymphs and stars: then, once those were drawn in, the pages were handed back to scribe #1 to add the labels (and, bizarrely, the breasts and probably some of the hair-styles too).

It’s a bit hard to explain why the author (who I suspect was also scribe #1) should have chosen this arrangement: the only sensible explanation I can think of is that perhaps there was a change in plan once scribe #1 saw the nymphs that had been drawn by scribe #2, and so decided to make them a little more elaborate. You have a better theory about this? Please feel free to tell us all! 🙂