A little bird (hi, Terri) told me about a flurry of activity on the Voynich mailing list prompted by some posting by Sean Palmer, who Cipher Mysteries readers may remember from his pages on Michitonese and the month names. Well, this time round he’s gone after a rather more ambitious target – the internal word structure of Voynichese.
Loosely building on Jorge Stolfi’s work on Voynichese word paradigms, Sean proposes a broadly inclusive Voynichese word generator:-
^ <---- i.e. start of a word
(q | y | [ktfp])* <---- i.e. one or more instances of this group
(C | T | D | A | O)* <---- i.e. one or more instances of this group
(y | m | g)? <---- i.e. 0 or 1 instances of this group
$ <---- i.e. end of a word
...where...
C = [cs][ktfp]*h*e* <---- i.e. basically (ch | sh | c-gallows-h) followed by 0 or more e's
T = [ktfp]+e* <---- i.e. gallows character followed by 0 or more e's
D = [dslr] <---- i.e. (d | s | l | r)
A = ai*n* <---- i.e. basically (a | an | ain | aiin | aiiin)
O = o
Sean says that his word paradigm accounts for 95% (later 97%) of Voynichese words, but I’d say that (just as Philip Neal points out in his reply) this is because it generates way too many words: what it gains in coverage, it loses in tightness (and more on this below).
Philip Neal’s own Voynichese word generator looks something like this:-
^
(d | k | l | p | r | s | t)?
(o | a)?
(l | r)?
(f | k | p | t)?
(sh | ch )?
(e | ee | eee | eeee)?
(d | cfh | ckh | cph | cth)?
(a | o) ?
(m | n | l | in | iin | iiin)?
(y)?
$
Though this is *much* tighter than Sean’s, it still fails to nail the tail to the sail (I just made that up). By 2003, I’d convinced myself that the flavour of Voynichese wasn’t ever going to be satisfactorily captured by any sequential generator, so I tried defining an experimental Markov state-machine to give an ultra-tight word generator:-
It wasn’t by any means perfect (there’s no p and f characters, for a start), but it was the kind of thing I’d expect a “properly tight” word paradigm to look like. But even this proved unsatisfactory, because that was about the time when I started seeing o / a / y as multivalent, by which I mean “performing different roles in different contexts”. Specifically:-
- Is the ‘o’ in ‘qo’ the same as the ‘o’ in ‘ol’ or ‘or’?
- Is the ‘a’ in ‘aiin’ the same as the ‘a’ in ‘al’ or the ‘a’ in ‘ar’?
- Is word-initial ‘y’ the same as word-terminal ‘y’?
Personally, I think the answer to all three of these questions is an emphatic ‘no’: and so for me it was the shortest of ceonceptual hops from there to seeing these as elements of a verbose cipher. Even if you disagree with me about the presence of verbose cipher in the system, I think satisfactorily accounting for o / a / y remains a problem for all proposed cipher systems, as these appear to be knitted-in to the overwhelming majority of glyph-level adjacency rules / structures.
Really, the test of a good word generator is not raw dictionary coverage but instance coverage (“tightness”), by which I mean “what percentage of a given paradigm’s generated words does the instances-as-observed make up”.
Philip’s paradigm generates (8 x 3 x 3 x 5 x 3 x 5 6 x 3 x 7 x 2) = 1,360,800 possible words, while my four-column generator produces – errrrm – no more than 1192 (I think, please correct me if I’m wrong): by contrast, Sean’s generator is essentially infinite. OK, it’s true that each of the three is optimized around different ideas, so it’s probably not entirely fair to compare them like this. All the same (and particularly when you look at Currier A / B sections, labels, etc), I think that tightness will always be more revealing than coverage. And you can quote me on that! 😉