I posted here a few weeks ago about whether the Cisiojanus mnemonic might be in the Voynich zodiac labels, and also about a possible July Cisiojanus crib to look for. Since then I’ve been thinking quite a lot further about this whole topic, and so I thought it was time to post a summary of Voynich labelese, a topic that hasn’t (to my knowledge) yet been covered satisfactorily on the web or in print.
Voynich labelese
Voynich researchers often talk quite loosely about “labelese”, by which they normally mean the variant of the Voynichese ‘language’ that appears in labels, particularly the labels written beside the nymphs in the zodiac section. These seems to operate according to different rules from the rest of the Voynichese text: which is one of the reasons I tell people running tests on Voynichese why they should run them on one section of text at a time (say, Q20 or Q13, or Herbal A pages).
The Voynichese zodiac labels have numerous features that are extremely awkward to account for:
* a disproportionately large number of zodiac labels start with EVA ‘ot’ or ‘ok’. [One recurring suggestion here is that if these represent stars, then one or both of these EVA letter pairs might encipher “Al”, a common star-name prefix which basically means “the” in Arabic.]
* words starting EVA ‘yk-‘ are also more common in zodiac labels than elsewhere
* most (but not all) zodiac labels are surprisingly short.
* many – despite their short length – terminate with EVA ‘-y’.
* a good number of zodiac labels occur multiple times. [This perhaps argues against their obviously being unique names.]
* almost no zodiac labels start with EVA ‘qo-‘
* in many places, the zodiac labels exhibit a particularly strong ‘paired’ structure (e.g. on the Pisces f70v2 page, otolal = ot-ol-al, otaral, otalar, otalam, dolaram, okaram, etc), far more strongly than elsewhere
That is, even though the basic ‘writing system’ seems to be the same in the zodiac labels as elsewhere, there are a number of very good reasons to suspect that something quite different is going on here – though whether that is a different Voynich ‘language’ or a type of content that is radically different from everything else is hard to tell.
Either way, the point remains that we should treat understanding the zodiac labels as a separate challenge to that of understanding other parts of the Voynch manuscript: regardless of whether the differences are semantic, syntactic, or cryptographic, different rules seem to apply here.
Voynich zodiac month names
If you look at 15th century German Volkskalender manuscripts, you’ll notice that their calendars (listing local feasts and saint’s days) typically start on January 1st: and that in those calendars with a zodiac roundel, January is always associated with an Aquarius roundel. Modern astrologically / calendrically astute readers might well wonder why this would be so, because the Sun enters the first degree of Aquarius around 21st January each year: so in fact the Sun is instead travelling through Capricon for most of January.
However, if you rewind your clock back to the fifteenth century, you would be using the Julian calendar, where the difference between the real length of the year and the calendrical length of the year had for centuries been causing the dates of the two systems to diverge. And so if we look at this image of the March calendar page from Österreich Nationalbibliotech Cod. 3085 Han. (a Volkskalender B manuscript from 1475 that I was looking at yesterday), we can see the Sun entering Aries on 11th March (rightmost column):
Note also that some Volkskalender authors seem to have got this detail wrong. 🙁
All of which is interesting for the Voynich Manuscript, because the Voynich zodiac month names associate the following month with the zodiac sign, e.g. Pisces is associated with March, not February (as per the Volkskalender), etc. This suggests to me (though doubtless this has been pointed out before, as with everything to do with the Voynich) that the Voynich zodiac month name annotations may well have been added after 1582, when the Grigorian calendar reforms took place.
Voynich labelese revisited
There’s a further point about Voynich labelese which gets mentioned rarely (if at all): in the two places where the 30-element roundels are split into two 15-element halves (dark Aries and light Aries, and light Taurus and dark Taurus), the labels get longer.
This would seem to support the long-proposed observation that Voynich text seems to expand or contract to fit the available space. It also seems to support the late Mark Perakh’s conclusion (from the difference in word length between A and B pages) that some kind of word abbreviation is going on.
And at the same time, even this pattern isn’t completely clear: the dark Aries 15-element roundel has both long labels (“otalchy taramdy”, “oteoeey otal okealar”, “oteo alols araly”) and really short labels (“otaly”), while whereas the light Aries has medium-sized labels, some are short despite there being a much larger space they could have extended into (“oteeol”, “otolchd”, “cheary”). Note also that the two Taurus 15-element roundels both follow the light Aries roundel in this general respect.
It therefore would seem that the most ‘linguistically’ telling individual page in the whole Voynich zodiac section would seem to be the dark Aries page. This is because even though it seems to use essentially the same Voynich labelese ‘language’ as the rest of the zodiac section, the labels are that much longer (or, perhaps, less subject to abbreviation than the other zodiac pages’ labels).
It is therefore an interesting (and very much open) question as to whether the ‘language’ of the text presented by the longer dark Aries labels matches the ‘language’ of the circular text sequences on the same page. If so, we might be able to start to answer the question of whether the Voynich labels are written in the same style of Voynichese as the circular text sequences on the same pages, though (with the exception of most of the dark Aries page) more abbreviated.
Speculation about ok- and ot-
When I wrote “The Curse of the Voynich”, I speculated that ok- / ot- / yk- / yt- might each verbosely encipher a specific letter or idea. For example, in the context of a calendar, we might now consider whether one of more of them might encipher the word “Saint” or “Saints”, a possibility that I hadn’t considered back in 2006.
Yet the more I now look at the Voynich zodiac pages, the more I wonder whether ok- and ot- have any extrinsic meaning at all. In information terms, the more frequently they occur, the more predictable they are, and so the less information they carry: and they certainly do occur very frequently indeed here.
And beyond a certain point, they contain so little information that they could contribute almost nothing to the semantic content, XXnot XXunlike XXadding XXtwo XXcrosses XXto XXthe XXstart XXof XXeach XXword.
So, putting yk- and yt- to one side for the moment, I’m now coming round to the idea that ok- and -ot- might well be operating solely in some “meta domain” (e.g. perhaps selecting between one of two mapping alphabets or dictionaries), and that we would do well to consider all the ok-initial and ot-initial words separately, i.e. that they might present different sets of properties. And moreover, that the remainder of the word is where the semantic content really lies, not in the ok- / ot- prefix prepended to it.
Something to think about, anyway.
Voynich abbreviation revisited
All of which raises another open question to do with abbreviation in the Voynich Manuscript. In most of the places where researchers such as Torsten Timm have invested a lot of time looking at sequences that ‘step’ from one Voynichese word to another (i.e. where ol changes to al), those researchers have often looked for sequences of words that fuzzily match one another.
Yet if there is abbreviation in play in the Voynich Manuscript, the two syntactic (or, arguably, orthographic) mechanisms that speak loudest for this are EVA -y and EVA -dy. If these both signify abbreviation by truncation in some way, then there is surely a strong case for looking for matches not by stepping glyph values, but by abbreviatory matches.
That is, might we do well to instead look for root-matching word sequences (e.g. where “otalchy taramdy” matches “otalcham tary”)? Given that Voynich labelese seems to mix not only labelese but abbreviation too, I suspect that trying to understand labelese without first understanding how Voynichese abbreviation works might well prove to be a waste of time. Just a thought.
Dark Aries, light Aries, and painting
As a final aside, if you find yourself looking at the dark Aries and light Aries images side by side, you may well notice that the two are painted quite differently:
To my mind, the most logical explanation for this is that the colourful painting on the light Aries was done at the start of a separate Quire 11 batch. That is, because Pisces and dark Aries appear at the end of the single long foldout sheet that makes up Quire 10, I suspect that they were originally folded left and so painted at the same time as f69r and f69v (which have broadly the same palette of blues and greens) – f70r1 and f70r2 may therefore well have been left folded inside (i.e. underneath Pisces / f70v2), and so were left untouched by the Quire 10 heavy painter. Quire 11 (which is also a single long foldout sheet, and contains light Aries, the Tauruses, etc) was quite probably painted separately and by a different ‘heavy painter’: moreover, this possibly suggests that the two quires may well not have been physically stitched together at that precise point.
Note that there is an ugly paint contact transfer between the two Aries halves (brown blobs travelling from right to left), but this looks to have been an accidental splodge (probably after stitching) rather than a sign that the two sides were painted while stitched together.
Judging by the script, I think it is not possible to date the zodiac month names to the second half of the 16th c., the script is certainly before 1500, even the Beinecke 408 script looks more modern. It is much more likely that the ms. author was interested in or concerned with calender reform, the time of B.408 is the time of the reform councils, Konstanz and Basel/Firenze/Ferrara and its failed calender reforms which lots of people were interested in
It is not likely that ok or ot are a ciphre or rather abbreviations for saint /sanctus, The saints in calendars are called not sanctus but papa, episcopus, confessor, martyr, virgo etc.
Btw, it is obvious that some kind of abbreviation is going on in B. 408, I only would liike to remind you that in the Middle Ages abbreviation by contraction is much more common than abbreviation by truncation
Helmut Winkler: I completely agree that the zodiac month hand appears to be fifteenth century, it’s in fact a point I’ve argued quite passionately about in the past.
Yet the link between Pisces and March is something that – as far as I can have been able to determine – people didn’t make until the Grigorian calendar came into effect. So my underlying point was – as with so many Voynich-related things – that even something as apparently straightforward as the link between signs and months can hide historical complexity that would be easy to overlook. I would be a little surprised if the zodiac month hand writer was in any way connected with the Voynich except as a later owner, though.
As far as ot/ok go, I thought I made it reasonably clear in the post that I didn’t think this was likely that these were cipher tokens for saint etc. It also depends which calendar you are looking at: many 15th century German calendars listed many more bishops than saints. 🙂
And finally: I also agree that abbreviation by contraction was more common than abbreviation by contraction. Yet in the case of the Voynich, we see EVA -y at the end of so many words that it cannot easily be a vowel, or even a consonant (except in some kind of artificial language): so it seems a reasonable guess that the shape denotes truncatio rather than contractio. If there are other explanations that are consistent with the specific detail of what we see, I’d be happy to hear them.
In my opinion, it is simply simply plants and their names. I do not think of scientific names, but the names he knew.
Reason:
I look at the sequence of the months.
I have a double number of nymphs, especially in the Aries and the Bull. This increases not only in the number, but also changes its appearance.
At the beginning of Aries they are still well dressed in baskets or pots. This changes with the bull until they stand naked and alone.
Exactly in the spring time is to be considered with the plants. I now still take the 11 days where there is still missing (correction Gregorian calendar), then it fits perfectly.
And since the whole book is about plants anyway, it is for me actually on the hand.
Therefore, I can not believe a holiday, or other special days.
But that’s my opinion.
@Nick
What you see as “EVA -y” could be a shortcut for small. If one is familiar with German, then the ending is usually for the reduced or vernied form.
Blume / Blümchen. Gänseblume Gänseblümchen, Rose / Röschen etc.
Geht auch mit Haus / Häuschen, Vogel / Vögelchen. etc.
It always goes.
I do not know if it is so in the English
re
“he link between Pisces and March” see e.g.
Brit.Lib. MS Egerton MS 3277 (second half of the fourteenth century) fol.1v.
http://www.bl.uk/manuscripts/Viewer.aspx?ref=egerton_ms_3277_fs001ar
A calendar’s months run by rote, but in terms of an agricultural calendar the constellation’s actual time of rising is more relevant.
Sorry – I don’t think I made it clear. The constellation is set in the middle of the month here, and in that calendar (the Bohun Psalter calendar) it is clear that the interest was in when the constellation appeared. In some cases, you find two constellations shown for the same month.
“the Voynich zodiac month name annotations may well have been added after 1582, when the Grigorian calendar reforms took place.”
That is a very interesting observation and a great piece of research. If the zodiac names were added decades after the ms was written, but no other ‘decodes’ then perhaps whoever wrote those names could not read the writing.
“in the case of the Voynich, we see EVA -y at the end of so many words that it cannot easily be a vowel, or even a consonant (except in some kind of artificial language): so it seems a reasonable guess that the shape denotes truncatio rather than contractio.”
I see it as contractio. EVA -y, if Latin, would be the suffix -cum. In ‘voynichese’ it appears to be used as ‘-um’.
I continue to use statistical analyses of letter frequencies to determine most probable expansions of the glyphs. It takes time because every projected expansion must be tested with all the ‘voynichese’ words to which it may apply.
initial y likely expands as ‘co’ or ‘con’.
Examples
ykoiin > colamus = strain / filter / purify
ykor > coloris = colour / complexion
ykol > coloque = place / location
ykoiin > colonus = cultivator / gardener
yly> coquum = cook
I read EVA-l as ‘qu’ before a vowel, else ‘que’. EVA-ol as ‘oque’ makes no sense, but ‘coque’ is applicable to cooking, boiling, heating as e.g. of herbs to extracts oils. Also: ‘al’ seems to be an exception where ‘l’ = ‘qua’, so al > aqua= water, also spa, also urine. EVA-lol, if I am correct, is ‘quoque’.
I am trying to automate as much transcription as possible, so as to be objective, but that means spending a lot of time programming.
I frequently refer back to the illustrations to verify the transcription. It can be hard to distinguish some shapes, also the spaces need to be verified. The more I transcribe, the more context I have for the difficult parts.
Two more points: the words offset at the bottom of paragraphs appear to be very brief summaries and the writer’s Latin is likely to be a local variant.
dain os teody > sanis oesus (o)leosum = health use oil
otol daiin > oleoque sanus = health from oil
dChaiin > seramus = we begin (as e.g. a book)
Still working on ‘ydaraiShy’. It can be expanded and split in too many ways.
Your thoughts are welcomed.
@Patrick Lockerby
So far I agree with you. Where I at the prefix (que) also the prefix (ex) consider. Depending on which one he uses (sharp or round).
Like us between t and f. small difference …. great effect,
Nick: I am somewhat troubled by talk of the Voynich language being different all over the place.
If this is really the case then I think the author would need to have had another guide to decipherment explaining how the different rules or features of the language operate in different parts of the manuscript as doing this from memory would seem to be infeasible. Having 2 manuscripts one the Voynich and the other rules for deciphering the Voynich would seem superfluous as the individual wanting to read or write in the Voynich would need 2 manuscripts not 1 probably both kept in broadly speaking the same place.
If there was no guide to reading the manuscript then one can only assume that the author would inevitably have forgotten many of the rules and so been unable to read his/her manuscript. I think the author would have realised this problem before writing 200 pages, so it unlikely the author made this mistake.
On the basis of this I can only conclude that the apparent differences in Voynichese across the manuscript are not as great as some believe. If this is the case my temptation is to explain this in part due to the presence of null text which could easily have muddied the waters and so making the Voynichese seem more different than it is.
I do wonder about the parallel between null text and the idea of the manuscript being a hoax. Clearly “null text” is like “hoax text” though if only some of the text is null that means other parts of the text have meaning therefore one is not talking about a true hoax.
Mark: some researchers have tried to explain away the difference in Voynich ‘language’ between sections as being a consequence of the different vocabulary in those sections. But this falls down because we have Herbal A and Herbal B pages: if they are both simply herbals (and I must admit to having strong doubts, though I may be the only person to do so), then it would seem particularly strange if they were to require two different types of language.
Other researchers have proposed that one Voynich ‘language’ may have a Latin plaintext, another a different plaintext: but we would surely see very different word constructions in the two sections, because of different word-endings, grammar, structure, etc.
And so the different Voynich ‘languages’ remain a very problematic part of the Voynich mystery: the oddities of labelese form yet another confounding feature. :-/
Did Tycho Brahe add notes to the Voynich ?
xplor: currently, there is no evidence that Brahe, Kepler or even Tadeáš Hájek ( https://en.wikipedia.org/wiki/Tade%C3%A1%C5%A1_H%C3%A1jek ) ever saw the Voynich Manuscript, let alone added any notes to it.
Nick: To what extent do you think the author must have used some encryption tool? Purely as an example the simplest tool being something like a look up table.
Such as in our latin alphabet.
aa->do
ab->tp
etc.
i.e. a character pair lookup table.
How many rules do we think the author could realistically commit to memory?
I suppose it is possible that the author had a facility for remembering a vast number of distinct rules so as to be able to read and write the manuscript at reasonable speed. However this seems to me to be less likely.
As an example if the rules were so complex that it took the author 10 minutes or so just to read or write a single word the manuscript would be unmanagable and impractical to use.
My attraction to the idea of null text or null characters is that it could account for a lot of the variation. The author could have inserted more of some character nulls in some parts of the manuscript and different null text in other parts of the manuscript. I have long suspected that the repeated words and words with 1 character difference next to each other must be indicative of null text or possibly verbosity.
The author might have had a penchant for inserting some null text in some parts of the manuscript and just switched to other null text elsewhere.
Clearly differences can be accounted by varying vocabulary, however as you point out this does not seem enough.
If the manuscript were a hoax, which I don’t believe it is, obviously we could never prove it as such. Similarly the more null text or characters the harder it is as we are left wading through a swamp of nonsense in search of the slightest bit of meaning.
Still as far as I am concerned the jury is still out.
Mark: devising one fairly impractical encryption scheme and then rejecting it isn’t really strong grounds for rejecting the idea of encryption. 🙂
Nick: I think one can’t reject the idea of encryption; in fact that is an idea I am now confident in. As I don’t believe, for a variety of reasons that I need not go into here, that it can be a hoax. Similarly the case that we are dealing with an unencrypted language, now and for some time, has seemed unlikely to me, even if it is embedded in null text.
However I find it very hard to accepted that the cipher is so complex as some seem to imply as it would be absolutely unwieldy and impossible to work with unless the author was some kind of savant, which I don’t believe.
So my inclination is that the differences one sees may be in some instances due to different vocabulary, but not due to significantly or probably even mildly different cipher rules.
A smattering of null text and null characters and possibly more than one character corresponding to the same character could explain a lot. As an example in one part of the manuscript the author might frequently use null text “z£z” and elsewhere “q$q”, which could account for the noticeable differences.
One thing of possible note, although not something I would be inclined towards, is that if there are different authors there would be different writing styles and vocabulary used even if the same cipher was applied; this could of course account for some variation.
I still find it hard to believe the author was such an idiot that he managed to produce a nonsense manuscript that he couldn’t read back by including far too many rules though it is possible.
So in conclusion I would caution against the idea that everything is so fundamentally complicated in the Voynich cipher.
Mark: I would tend to expect that the author could very nearly read the decryption off the page, and that almost no external knowledge (e.g. a cipher key) would be necessary to do this. But few other people to date have taken this kind of position.
Patrick: I actually meant the post-1582 dating speculation more as a provocation than a firm result. The pre-1500 dating suggested by the handwriting sits uncomfortably with the Pisces-March link, and I don’t yet know how to reconcile the two.
As with so much Voynich evidence, it’s a bit of a minefield: tread carefully!
There have always be two different ways of mapping the zodiac signs with the months.
One is related to the ‘largest overlap’. The Sun spends most time in zodiac sign ‘Z’ during month ‘M’. The other is concerned with the month in which the Sun enters the sign in question.
In the very interesting Byhrtferth diagrams from the 10th Century (Oxford St.Johns MS 17 or BL MS Harley 3667) , Pisces is associated with March, Aries with April, etc, just like in the Voynich MS. This is the ‘largest overlap’ case.
Nick: I am inclined to agree. If this manuscript is intended for practical usage on a regular basis it ought to be something that the author could pick up and read without too much difficulty. It is possible that the intention was to read the manuscript infrequently only when specific information was required in which case a slower and more difficult process of reading the manuscript might be acceptable.
However if the author was carrying out recipes then one might imagine these would be done frequently. Nevertheless it is not completely obvious to me how often the author would refer to the manuscript for information. That is assuming that the author was not merely using the manuscript to store information in transit similar to your Averlino hypothesis in which case a more difficult process of encipherment and decipherment would be viable.
If he could read the manuscript easily was that because he had a really good memory for all the rules, well maybe I suppose. However my instincts are against that.
So what is the explanation? Well, the best I can come up with is either a certain amount of junk or a transformation with a different simple input in the differing sections of the manuscript, which would transform the appearance of much of the text in that section in a way that was easy for the author to apply, where presumely that input would be specified at the beginning of each section. I err in favour of the junk theory, but I can’t say that I am hugely confident with that.
Rene: in the tenth century, the calendar hadn’t yet “slid back” quite as much as it would do by the time of the introduction of the Grigorian calendar reforms, so back then each zodiac sign fell in about 50% of one month and 50% of the next.
With the further sliding that happened by 1450, the largest overlap then was with the preceding month (i.e. in the 15th century calendar shown, Pisces started on the 11th February). So the largest overlap after the 10th century or so was always with the preceding month: which is perhaps why the 10th century or so was the last calendrical chance for a ‘largest overlap’ following-month case. :-p
Mark: “A smattering of null text and null characters and possibly more than one character corresponding to the same character could explain a lot.” — that was essentially John Manly’s theory of the cipher (see http://www.as.up.krakow.pl/jvs/library/0-8-2008-05-23/nill_ring_binder.pdf).
This is a testable hypothesis, _assuming access to corpora of texts in relevant 15th (and possibly pre-15th Century) languages (including transcriptions that preserve use of abbreviations)_. This highlights the need to locate and assemble pointers to such resources as a tall pole in Voynich research.
Like any other cryptanalytic hypothesis, this has to explain the low 2nd order entropy, the observed “word” morphology, the observed correlations between glyphs across spaces, the vocabulary obeying Zipf’s law, and the binomial(ish) word length distribution — I think there are multiple items on that list that would be problematic for this theory.
Karl
Karl: Thanks for your very interesting message.
I wonder to what extent each of the items you list are problematic. To me the idea of null text does not mean that it is random text.
For example if I insert the words “empty”, “nothing” etc. In the text or append the string “null” at the front of words at regular intervals then this has the appearance of pattern where in fact there is no real underlying pattern. This is something that I can naturally see myself doing if I were the author doing rather than writing down random words. In fact using repeated junk text as one sees fit seems more straightforward.
Of course if you have junk text you still.need to.easily be able to distinguish between the junk and the meaningful text.
So I am not sure if or how the points that you mentioned go against my perspective on null text.
Nick, that’s true enough. Around 1000, it would be about 50/50, and one could rephrase ‘largest overlap’ by ‘where the Sun is at the beginning of the month’.
Whatever the motivation, it was certainly being used.
Another reason for the shift is that during the medieval centuries, the year began with Easter, not Christmas or New Year.
On the other hand, in some regions the financial year began in January when debts were settled.
Secular calendars sometimes began in December.
The sailing year began in March (or so) and as I’ve pointed out, the months in the Voynich calendar are those of the sailing year in the Mediterranean, with the non-sailing months being omitted.
It’s probably fruitless to say this again – but there is no necessary connection between a calendar and astrological calculations, and for medieval people – and especially illiterate people – the months of the year were signalled by the presence of one or another of the twelve constellations. That’s why they are carved into caledrals and made into mosaics from such an early period, within the very precincts of those who most inveighed against the superstitious use of natural phenomena, including stars.
I really cannot understand the fixation on imposing astrology on a series of images whose only obvious connection to the months are the month-names written in that fifteenth-century hand. Not even the images in the centres conform to Latin zodiac types, nor to they form a zodiac series. It’s even a bit of a stretch to identify them with the months, except for those late inscriptions – after which we have to presume it used as a calendar, but there is no evidence of any intention to use it for astrology. Same is true, actually, of most calendars before the popularist printed books which appear later than the ms was made.
Not that such quibbling about historical fact is likely to trouble many.. but if the calendar wasn’t intended to be a standard Latin liturgical roster sort of calendar, and is too early to be a popularist German astrological pastiche, then perhaps the labels are talking about some other system altogether. One possibility is equation between places and times. After all the figures called ‘nymphs’ are clearly related to the tyche figures in the ‘bathy-‘ section, and tyches were (so to speak) pre-Christian patron saints of specific places.
but I’ve said all this elsewhere… carry on..
Diane: the idea that this series of images is not a zodiac sequence is (as far as I can tell) yours and yours alone. I’d be interested to see the reasoning behind any other zodiac denialist theorists you happen to stumble upon.
One interesting observation is that all the word types together build a single network full of words that fuzzily match each other. The network exists not only since glyphs can be added and removed. It also exists since glyphs are replaces with similar ones. For instance beside the word [chedy] not only the words [chey], [chdy], [cheedy] and [cheody] exists. It is possible to find words as [shedy], [cheky] and [cheda]. Therefore abbreviatory matches are not enough to explain all the similar word types.
Torsten: I don’t claim for a minute that abbreviatory matches do (or even could) explain everything odd about Voynichese. Rather, the real point of the post was that I think Voynich “labelese” (and in particular the Voynich zodiac labelese) should be taken more seriously on its own terms as a separate ‘dialect’ / ‘language’ / corpus, because understanding its nuances may well reveal much more about its specific dynamics than trying to do composite statistics on all the text at the same time.
Nick: The point was that there is only one network. Therefore it doesn’t matter if the statistics are build for the whole manuscript or only for a part of it. The result is always one network of similar words.
Only on glyph level it is possible to describe rules like almost no zodiac labels start with [qo-] or disproportionately large number of zodiac labels start with [ot] or [ok]. But on word level there is always just one network of words similar to each other. Sometimes this network gets more obvious like in the sequence of labels [otaral] [otalar] [otalam] [dolaram] [okaram] on page f70v2.
Torsten: the notion that there is only one network even though different sections (Herbal A, Herbal B, zodiac labelese, Q13, etc) behave differently is your working hypothesis, it is not a proven thing.
In my opinion, the essential nature of “the network” remains unclear: much of it could very well be explained as properties of a covertext (e.g. the sequence of shapes implied by a verbose cipher) rather than by a single semantic network. That is, much of what you are concerned with could very well be an orthographic (fake-letter-expressive) network.
I suppose the unanswered question is this: why is there only a single network yet multiple languages?
Nick: You describe the VMS yourself this way. You didn’t say that a specific word type or feature is unique for the Zodiac section. Instead you make statements like a disproportionately large number of zodiac labels start with EVA [ot-] or [ok-] or many words terminate with EVA [-y]. This way you describe words sharing the same features which can also occur in other sections of the manuscript. I call what you describe a network of similar words.
Torsten: we’re in agreement that the mode of writing is essentially the same across all Voynich sections. And we’re surely not in disagreement that the text in different sections behaves differently.
The key evidential difference between our positions seems to be that I think that there is a substantial divergence between the ways that individual sections work. It is true that ot- and ok- words appear all through the text, but it is also true that a feature doesn’t need to be unique to a section to make it significant, e.g. chol in Herbal A, etc.
About prefixes ot and ok EVA : Maybe the change is related to the person’s gender, one for male characters (father, abbey) and the other for female characters (mather).
Ruby: Mr + Mrs, eh? That’s a very good suggestion, thanks! 🙂
Alternatively, it is occasionally said that labels should not be plural words. I think that this isn’t necessarily a hard rule. Anyway, if the initial 4 is a plural marker, both its rare appearance in labels, and the appearance of words with and without it in the main text, would be explained.
Rene: in addition, the apparently strong linkage between qo- words and a preceding -y word has been mentioned many times. Hence it could be that qo- actually ‘pluralizes’ the preceding word, not the word of which it would seem to be the prefix. Something to consider, anyway. 😉
Nick: If you search for differences each page of the manuscript is unique. For instance the word type [chol] occurs 7 times on page f3r but there is not a single instance of this word type on page f5r. Did this mean that the two pages contain different dialects?
If you ask what have the same two pages in common you will find that page f3r and f5r share for instance the words [daiin], [cthy] and [cheor]. They also share that a good number of word occur multiple times and they also share that they contain particularly strong ‘paired’ structures (e.g. on the f3r page [chol] beside of [cthol] and [chor] and on page f5r the sequence [ckhoy cthey chey]). You have described similar features for the Zodiac section in this blog post.
The question is, should we focus on the differences between single pages or should we focus on features common for all the pages?
Torsten: my preferred third option would be to build (and expand) on the work of Prescott Currier by focusing on the systematic differences between groups of pages that behave in manifestly different ways to each other.
There is clearly a functional layer of differences beneath “the network” and above the individual page: and the sooner we find a way to systematically determine what those page groupings and group differences are, the more constructive our conversations will become.
Nick: Currier clearly described local variations as part of a larger system. For instance he described statistical differences like “the symbol groups ‘chol’ and ‘chor’ are very high in A and often occur repeated; low in B”. Currier was clear about what he was describing: “These features are to be found generally in the other Sections of the manuscript although there are always local variations.”
Torsten: …and yet Currier is remembered not for pointing out the encompassing orthographic network, but for pointing out the two major page groupings, A and B.
For me, the existence of different functional behaviours of specific sections of the Voynich is axiomatic, whether Herbal A, labelese, or whatever. I just wish you could turn your sharp brain and eyes to consider this even hypothetically, rather than turn everything round into a discussion of the network. 🙁
Nick: There is a reason to use the illustrations to divide the manuscript into sections. From the text itself no clear indications to divide the manuscript into sections are known. It is on you to demonstrate evidence for your hypothesis that there is different functional behaviour somewhere in the VMS.
It doesn’t matter what you compare for the VMS. You can compare two pages, two quires, two sections or Currier A with Currier B. The outcome is always that there are differences and similarities. Both the differences and the similarities are part of the system. With other words whenever you think that you know what the rules are each new page will demonstrate you wrong. Therefore I find it far from surprising if you describe some special features for the Zodiac section. If the Zodiac section would be behave like another section of the VMS this would be a surprise.
Torsten: OK, so would I be right in saying that you believe that Prescott Currier had no significant evidence for his findings?
Nick: Curriers observations and your interpretation of this observations are two different things. Please reread what Currier has written. He has described significant statistical differences between what he has called “language A” and “language B”. He has also explicitly said that his “use of the word language is convenient, but it does not have the same connotations as it would have in normal use.” This is something else then your hypothesis about different functional behaviours of specific sections.
The problem is that observations like something is more common then somewhere else are not enough to back up your hypothesis about different functional behaviours of specific sections. If the functional behaviour of the VMS is a permanent shift it is no surprise that also some features are more or less common for the Zodiac section.
Torsten: your interpretation of Currier’s findings is quite plainly different to just about everyone else’s. I know full well what he meant by ‘language’, thanks for asking.
Nick and ants.
Prescott Currier is wrong.
Torsten Timm.
Word ( daiin ) is wrong. no – d.
Word is read – 8 aiin.
8 is number. And means letters = F, P.
Otherwise, Eliška was 8 child.
Find out what the word means – aiin ( Ajin ). And you will be clear. The word is Jewish.
Nick: When you say “running tests on Voynichese why they should run them on one section of text at a time”. Is there any plan or agenda to do that applying the standard tests that have already been applied successfully to the whole text?
I would be really interested in hearing about the results for “labelese”.
Mark: even though I think Currier was right to flag A vs B, and for others to subdivide this further into Herbal A, Herbal B, Q13, Q20, etc, determining the cluster “boundaries” is something that different statistical methodologies yield different results for. I’m actively thinking about this at the moment, and in particular about the Voynich’s zodiac labelese, so expect more posts on this topic here. 🙂
Nick: Currier was comparing parts of the manuscript known as Currier A and Currier B. He described the differences between this sections as “features to be found generally in the other sections of the manuscript although there are always local variations”. Rene Zandbergen noted on his website that “Herbal-A pages located after the biological section are rather different from the early Herbal-A pages”. Beside this observations each page can have there own local variations. For instance page f15v contains many [or] groups like [chor or oro] or even if the symbol group [chor] is typical for Currier A this word can be missing on Currier A pages like on page f5r.
You described in this blog post special features for the Zodiac section. For doing so you use words like “more common in zodiac labels than elsewhere”. The interesting point is that you describe your observations in a similar way as Currier. You both describe features that are more common on some place then somewhere else.
All this observations together allows two conclusions. First, the frequencies of nameable symbol groups vary throughout the manuscript. Secondly the frequencies for this symbol groups didn’t change randomly. It is possible to describe features common for smaller or larger groups of pages. It seems that what happens is a shift from feature set A over feature set B to feature set C.
Torsten: the idea that all the different Voynich pages work within a broader shared ‘system’, orthographic system, or “network” (as you put it) is accepted – this is what allows us to call it Voynichese. Currier identified the major two A / B clusters, while more recent researchers (such as Rene Zandbergen, Glen Claston, and many others) have attempted to subcluster these further: some (e.g. Glen Claston) have tried to use this to identify an evolutionary / developmental sequence between subclusters. There is still much work to be done here.
Nick: As you know as far as the text goes my interest at this time is really only in single word labelese, so any statistical results you can produce on labelese which could well supplement a crib or block paradigm crib would be of great value.
I eagerly anticipate your future posts.
Nick: Also statistical results on Short Word labelese would be very interesting. I think the less scope that there is for null text or null characters the better and I think short word labelese is most likely to provide that.
Nick’s statement that there is still much to be done here is completely on the mark. Only a fraction of this topic has been properly explored, and it is also barely understood by a surprisingly large part of Voynich text ‘researchers’.
The differences: labelese vs. normal text on the one hand, and Currier A vs. B on the other hand, are not the same.
The comparison is made difficult by the small corpus of labels, and the fact that in any subset of Voynich text, about half the words are unique.
Each ‘oddity’ in the text statistics is a potential point of attack. A place to dig.
Rene: you mark zodiac labelese up as “C” in your cluster analysis page, do you still think this is correct?
Personally, I believe it is a cluster all of its own, one I’d place generally more towards A than B. But I don’t think the last word has yet been had on clustering, not by a long way.
Rene: I think the statement “there is still much to be done” could naturally be applied widely. I find it very hard to do everything I would like to with the time I have available for the Voynich and as you know my emphasis has been directed elsewhere.
I contend that “labelese” is most likely to be Voynichese at its simplest and therefore this subset of the language is the best point of attack. I would be curious as to the number of single word labels in the manuscript. I would expect single word labels to be largely unique within the label text.
I am somewhat wary of what I perceive to be an overemphasis on the application of statistics as largely sufficient for decipherment. Having said that the more reliable statistical information we have can only add value and augment information derived from a crib of some kind.
Rene: To compare means to examine the differences and the similarities. Moreover, a rule which applies throughout the manuscript describes something characteristic.
The frequencies for each group of symbols vary throughout the manuscript. At some point it becomes a rule that each section prefers there own set of common glyph groups. Therefore I wouldn’t call the observation that glyph groups like [ot-], [ok-] and [yk-] are more frequent in the Zodiac section an ‘odditiy’.
Torsten: Rene placed the word ‘oddity’ in scare-quotes, and so clearly meant it not to be interpreted as literally as you seem to have taken it.
Any significant localised deviation from a default pattern that is established over several hundred pages probably counts as an ‘oddity’ in some respect. The obvious difficulties in pursuing this idea are (a) finding a sensible level of what “local” means that is above the page level yet below the “network” level; and (b) establishing what kind of deviations could rightly be deemed “significant”.
I would contend that Currier A and Currier B easily satisfy these criteria: and, in the context of this post, I believe that zodiac labels also do. If we look at the zodiac labels in f70v2 (Pisces), taken (mostly) from Takahashi’s EVA transcription:
Here, 15 of the 30 labels begin with “ot-” (50%), while 9 begin with “ok-” (30%). My stats professor used to say that “stats begins at 30 [samples]”, and so perhaps even here we might have enough samples to begin to assess statistical significance. 🙂
With ‘oddity’ I meant: anything that ‘sticks out’.
W.r.t. Nick’s question about cluster ‘C’, this is based on the combined labels and the text in circles in the zodiac pages. Now whether the labels and the circular texts are similar is, as far as I know, one of many unanswered questions.
It is clearly important in order to judge ‘zodiac labelese’.
What’s also of great interest is that the zodiac section is the place where the transition from Currier A to Currier B is taking place. The most prominent feature is the frequency of the character pair ‘ed’ (in Eva), which Currier himself either did not notice or failed to report.
Since we have no doubt at all about the page order in this part of the MS, it is an excellent opportunity to understand this transition.
I did the statistical analyses reported in three of my web pages more than 15 years ago, and it is necessary for several reasons to redo them, and I also want to extend them. (Yes, there is really a lot to be looked at in more detail).
So first I decided to consolidate the input data.
Nick: Something can have statistical significance and can be expected. There is no contradiction.
The default pattern is that the each section prefers there own set of common glyph groups. Also the Zodiac section fulfills this pattern.
Torsten: so… what you seem to be claiming is that because the statistics of each page of the Voynich are inherently variable (relative to the average statistics of the entire Voynich corpus), it makes no sense to you to look for any clusters between the page level and the corpus level.
Is that a fair summary?
Nick: The opposite is the case. Such observations are highly interesting. For instance it is possible to generalize this observations to get an idea how the system behind the VMS works. Such observations also tell us something about the original order of the pages. Sections using similar sets of glyph groups probably follow each other.
My point is that the Zodiac section should be seen in the context of the whole manuscript. In my yes it is not an isolated section with features different from the rest of the manuscript.
Not only is the sample of labels small relative to the rest of the text, but (I’d argue) the best shot at understanding how labels relate to the main text will come from finding Pharma section labels associated with plant part drawings that are a match to part of a plant drawing in the herbal section (which is an even smaller sample :-() and seeing if/how they occur in the corresponding Herbal page text. If folks are aware that someone has already done that specific analysis, a pointer to it would be appreciated. — Karl
Karl: the thing I’ve been saying for several years now is that because of the “triple jump” nature of Voynichese (ie there are multiple different types of problems to solve all at the same time), we will need a block of labels (or more) plus a really inspired equivalence hypothesis about the plaintext to stand any chance. :-/
Torsten: then we are in agreement that there is more to be learned about the manuscript from the ways that ‘language’ changes between pages.
Nick: The ‘language‘ isn’t changing between pages. There is only a change in the set of words used.
The first step in deciphering an unknown script is to describe the script in detail. In this way your observations for the Zodiac section are excellent. It is important that a disproportionately large number of zodiac labels start or end with the same glyph groups. It is also important that a good number of zodiac labels occur multiple times. The observations of particularly strong ‘paired’ labels is also interesting.
Beside the fact that Currier describes observations for different words his observations for Currier A and B are similar. He also describes observations like “initial ‘cth’ very high in ‘A’; very low in ‘B'” or “final ‘dy’ is very high in Language ‘B’; almost non-existent in Language ‘A'”. Also Currier describes symbol groups that occur multiple times: “The symbol ‘chol’ and ‘chor’ are very high in ‘A’ and often occur repeated; low in ‘B'”. Even if he named ‘chol’ and ‘chor’ together he didn’t said something explicitly about ‘paired’ symbol groups.
The first step in deciphering an unknown script was always to understand how the script works. Therefore such observations are interesting no matter what the system in the end is. They give a first clue about the system behind the text. In some way they describe the grammar for the ‘language’ of the VMS.
Thank you all for a really productive exchange. I’d like to see more, even if I cannot contribute. Don’t like being a lurker, but have to for now. Thanks again
Don
BTW, I still have 3 small data files on my web page,
Starnames, Arabic Star Names. Adapted from “Star Names” by R. H. Allen,
Zodiac Stars, some search results from star names, and
Manzils, Manaizil al .Kamr (plural), Manzil (singular), 8.
The page containing the links is:
http://www.sixmilesystems.com/voynich.htm
The idea was to turn Allen’s book into a computer searchable form. I kept the page numbers as you see.
I would be happy to send copies to anyone who wants them.
Don
Don
Hinkley Allen is not considered entirely sound; may I suggest that those interested supplement his ‘Star Names…’ with Emilie Savage Smith’s (very conservative) ‘Islamicate Celestial Globes…’ (book, not article) which has a substantial section on the lunar mansions. I have also a photocopy of an excellent dictionary of Arabic astronomical terms – partly in English and partly in Arabic, but regret not having it with me at the moment so will have to return with details. Otherwise Kunitzch is a standard reference, in which many of H-A’s errors are specified.
For a good overview of the history of use in the west,
http://archive.aramcoworld.com/issue/201005/arabic.in.the.sky.htm
PS -in the aramcoworld article the diagram labelled ZUBDET UT TEVARIH (BY LOKMAN, 1583 ) is near identical to a diagram in the pictorial alamac made by Abraham Cresques for the court of France, completed in 1375. That almanac is usually called the ‘Catalan Atlas’ also ‘Majorcan…’ would be more appropriate.
Details of the Dictionary..
Mansur Hanna JURDAK, Astronomical Dictionary. The Zodiac and the Constellations (1950).
To correct the common assumption of standardized orthography, the variant forms found on extant astrolabes is helpful and Robert Gunther’s 2 volume ‘Astrolabes of the World’ (1932) remains invaluable.
And I might also suggest – again – for fifteenth-century usage among eastern mariners, Ibn Majid’s treatise of which an English translation was made by G.R. Tibbetts. The title begins ‘Arab Navigation in the Indian Ocean Before the Coming of the Portuguese…’
Nick – I’ve already posted this last material, but it seems not to have registered, so hope this isn’t a duplication.
I have been a little preoccupied by rare characters. I get a sense that for many people these are more of an aberration than anything of great interest. From my perspective I am not so sure as long as one can be sure that they are clearly defined and not merely smudges. I think it unlikely the author was just doodling and he/she was clearly purposeful in everything else.
If we were talking about “diplomatic ciphers” these would either correspond to specific words the author wanted to keep secret or possibility a rare letter pair or an alternative letter character rarely used.
Still if one can be confident of the appearance of the characters then they can serve a purpose as addition identifiers indicating the origin of the character set, which of course is something I am focused on. If they are clearly and unambiguously identifiable I see no reason to exclude them due to their rarity.
Nick: Just perusing this interesting post of yours and picking out the following things that I concur with->
“Voynich labelese, a topic that hasn’t (to my knowledge) yet been covered satisfactorily on the web or in print.”(My impression too and probably quite a bit of more work for me to on it.)
“This would seem to support the long-proposed observation that Voynich text seems to expand or contract to fit the available space.” (I have argued that the neatness of fit of circular text is indicative of the use of filler null words, which is a similar point.)
“the more I now look at the Voynich zodiac pages, the more I wonder whether ok- and ot- have any extrinsic meaning at all. In information terms, the more frequently they occur, the more predictable they are, and so the less information they carry: and they certainly do occur very frequently indeed here.
And beyond a certain point, they contain so little information that they could contribute almost nothing to the semantic content, XXnot XXunlike XXadding XXtwo XXcrosses XXto XXthe XXstart XXof XXeach XXword.”(This is on vaguely similar lines to my own thinking, though my rejecting whole words as null is more radical.)
Focusing on the labels written beside the nymphs in the zodiac section limits one a lot when talking about labelese as there are very many labels in the recipes pages, numerous labels in the cosmological pages and quite a number against the nymph drawings. And again the prevalence of the “ot” and “ok” prefixes is also evident there which would go against the idea that these prefixes are specific to the context as the context is so different.
When I started to look at single word labels I did not expect to come to the conclusion that there is a significant presence of null words amongst them and that it is quite probable that null words are common in the Voynich. This idea was to a large extent unexpected, by me, though perhaps it shouldn’t have been.
What I haven’t really studied are the features of the labels that I think are non-null, though this null word relevation, if true, shakes the general analysis of Voynichese and makes one wonder whether then Voynichese minus the null words can be treated as one whole.
I thought I would continue my thoughts here on labelese rather than Voynich Ninja. If there is very roughly a 60% and 40% split of null labels versus non-null labels then any statistical tests that one applied to labelese as a whole will evitably yield results with limited usefulness as one would be measuring two distinct types of label and aggregating that data. In this instance ultimately the only valuable statistics would be ones applied just to null labels or statistics just applied to non-null labels, so as to determine the statistical properties of null labels and separately the statistical properties of non-null labels. In fact if Voynichese as a whole is similarly split between non-null words and null words then statistical tests applied to Voynichese as a whole would have similar problems. The difficulty lies in trying to identify with a high level of certainty which are the null words and which the non-null words; I have managed to identify amongst the labels a not insignificant number of null labels, but arriving at a state where I am very confident
that I have identified 95% or more of them seems that much harder at this time. Obviously the statistical properties of non-null words would generally be much more interesting as this would be the real text.
Nick: I have been very busy of late, but I hope soon to return to my labelese research soon, though there are previous lines of my research that I would also like to follow up. I want to carry out an even more rigorous and detailed analysis that I will write up. For my research purposes I have found it very useful to produce a version of the Voynich containing only labels (by removing label free pages and text on pages that do not constitute labels); I have found this really useful as it makes it easier to focus on labels and remove the clutter irrelevant to this line of research. I can see why others would see this as a pointless exercise, but I have found it very useful; I would be happy to share this with others, though I doubt there would be much interest. (There may be some debate in a few instances as to whether some “labels” are actually “labels” and not part of “sentence text”.)
Working on the basis that my null word hypothesis is broadly speaking correct do you have thoughts as to how I could go about proving or disproving my hypothesis? And also on that basis a good procedure on how one might identify these null words? I have my own ideas, though those of others are worth hearing, I think.
Nick: Reading through the Voynich Ninja discussion of Labelese, I noted again that your preferred explanation for the same word labels appearing in many different contexts is abbreviation I.e. that they could be abbreviations different things e.g.
App -> Apple
App -> Appendix
App -> Application
I have to objections to this idea. The most important one is that these repeated words have very similar spellings one from another.
So for example we could have:
Opp -> Opportunity
Opp -> Opposition
Opp -> Oppression
Upp -> Upper
Upp -> Uppity
All these abbreviations have just one letter different that is fine for abbreviations of *pp words, but it would be nice to have a broader range of abbreviations; if this abbreviation theory is true we don’t find this breadth of abbreviations with labelese.
Also, I would expect that we are generally dealing with specific words amongst labels and so it would be significantly harder to guess the word if they are abbreviations.
So I think the abbreviations theory with respect to labelese sounds nice, but it doesn’t stack up when one looks at the evidence.
I am therefore drawn back to the explanation that we have a mixture of null and non-null words amongst the labels as I have yet to see anyone present a hypothesis that better fits the actual observations.(I think I have already dealt with all or at least most of the alternative hypotheses.)
Mark: *sigh* what I proposed in Curse in 2006 was a combination of abbreviation (to make words smaller) together with verbose cipher (to make words longer again). Hence “otolal” (the centre word of Pisces) would be parsed as ot-ol-al, i.e. six glyphs but only actually three tokens. So your description of what I proposed is a simplification that caused you to misunderstand what I wrote. So you rejected your own version of what I proposed, not what I proposed.
Nick: With all due respect that doesn’t address the specific point regarding identical labels. In that context whether you argue there is a verbose cipher is neither here nor there, the key question is how we explain the numerous identical labels against a number of specific and quite different drawings. Now from what you said on the Voynich Ninja thread I was of the impression that your explanation was that we were dealing with abbreviations of different words that were consequently spelled the same way once in abbreviated form. I was fully aware of your general theory, but I don’t believe that resolves the identical label problem. This is an important issue as I, of course, believe it has profound and important implications for Voynichese as a whole. If someone has a explanation that better fits what I have observed then that’s great, but I haven’t seen it yet. I have tried to systematically address all alternative explanations, but was yet to address the abbreviations idea, so that is what I wanted to do with you.
Nick wrote: “…what I proposed in Curse in 2006 …. Hence “otolal” (the centre word of Pisces) would be parsed as ot-ol-al, i.e. six glyphs but only actually three tokens….”
I haven’t seen Curse (my apologies for not reading it, Nick, it simply wasn’t on my radar until years after I had been studying the VMS), but based on my observations of Voynichese, I’m in complete agreement with what Nick says here and I cannot comprehend why people can’t see it.
Mark: you are making so many implicit assumptions about what (you think) labels are here (as well as about how the language underlying those labels works) that it should be no surprise that what I’m proposing doesn’t fit what you think you’re seeing. Taketh the pill that chilleth, crack open a cold beer, and enjoy the nice weekend.
Nick: Sincere thanks for the best wishes for the weekend.
JKP: How do you explain the observations that I have described in the context of that theory? It is perfectly possible that that theory explains other parts of the text, but it just can’t explain the label word facts if and of itself. You would at the very least need to add to that theory to explain those facts. To me it has significant implications for labelese and very likely for Voynichese as a whole.
Mark, you are asking a question that is too complex to answer in a comment. I have been working on a paper describing this for quite some time. It’s already more than 40 pages long, mostly finished since almost 2 years ago and I’m having trouble finding a block of “sustained concentration time” to finish it.
I find some things can be done if you doggedly put in the minutes one brick at a time—it adds up—and other things need a chunk of total immersion. As I’ve learned from my experiences in developing software… the last 10% is always the hardest 90% (I’m sure this applies to many other fields, as well). What I really need is a two-week vacation away from everything to get the hard bits finished but that’s not an easy thing to do when you are running a business.
Sorry, I’m not trying to be dismissive. Some things are just harder than others. If it were easy to explain, the VMS would have been described and solved a long time ago and there would be no mystery.
The problem is I’m not satisfied with simply describing the behavior of the labels or a portion of the text, which I feel I have a pretty good handle on. I am trying to explain how the text functions as a whole, and I think I have a pretty good handle on that too, but there are a couple of details, a couple of nagging questions that are particularly puzzling that I want to get them straight (or at least be able to describe cogently) before I post the paper (which I’ll probably upload to my blog as a PDF so anyone can download it, but we’ll see).
I’m also keenly aware that from the point of view of the rest of the world, it doesn’t matter how much effort I’ve put into this (and it’s considerable) it’s all vaporware until it is available to the public. So forgive me if I back off on this subject. I simply can’t answer your question in a few words but I will answer it when I get this project out the door.
JKP: Well, as you say, until you have presented your explanation then I stand by my analysis. To me the explanation seems not to be complicated, but simple; namely the presence of null words in the manuscript. Along time ago when I first heard about the repeated words in the Voynich I speculated the presence of null words in the manuscript. Unexpectedly having looked at many labels in the Voynich and the perfectly fitting circular text again I am pushed back towards the notion that there are null words in the Voynich. If so they appear to represent a not insignificant portion of the labels. However I strongly push back at the notion that they are all null unlike Rugg for various reasons. This does indeed raise the important question of why specific nulls appear in specific places. How did the author decide on which null word to use where? I think in part they conform to a small number of templates, so within that template less imagination is required.
Mark: until you can work out (even hypothetically) a statistical test for null words that doesn’t rely on a ton of additional implicit assumptions, you’re unlikely to gain much traction with other people. Merely appealing to “null words” or “templates” isn’t enough.
Nick: I agree it is not proven, but likewise nobody has provided statistical tests that provide another explanation of this phenomenon. All I have done is looked at a variety of explanations of these observations and concluded that other explanations really don’t provide an adequate justification for these features. I have suggested that this could be the explanation. As far as gaining much traction with other people, I am not greatly concerned by that, but rather have pointed out what I have observed and sought out the various explanations that others have suggested to see if I think they can provide a better explanation.
The statistical test that occurs to me at this time that one could perform relates to spelling clusters to see if one can make a clear subdivision between the “monotonous” words and the “distinctive” words. One could analyse data relating to the spelling patterns of words that repeat across labels in different contexts as opposed to those that don’t, so to see if there is a notable distinction. These wouldn’t directly prove that the words are nulls as always only a complete decipherment would, however establishing amongst labels that there are indeed these too classes of “monotonous repeating word labels” and “distinctive word labels”, I think bolsters the case that the “monotonous” words are null words and the “distinctive” words are non-null words. However my current line of research with regard to labelese requires me to study and document everything in more detail before moving onward; I have looked at what I can find that others have written on the subject, but I need to make sure I have really gone significantly further.
I had the conviction that labels would provide the best window onto Voynichese, but I didn’t anticipate, at all, where a study of them would take me, though there is probably much more to be found within them. The only problem I see with a study of labels is the relative shortage of them when compared with the number of words in sentence text. Nevertheless there are quite a lot of labels.
Things like the TTR figures for labels have some use, though from my perspective very limited use.
Mark: for the monotonous occurrences of daiin.daiin in Currier A, I suggested in Curse that these might well be enciphering multi-digit Arabic numbers, and suggested a specific codicological test that could be carried out to test it. (This hasn’t been tried.) Though I haven’t yet mentioned it on the blog, my follow-up suggestion would be that “qo[k/t]edy.qo[k/t]edy” might somehow be performing the same function in Currier B pages.
Currier A pages:
– daiin.daiin 21
– daiin.dain 8
– dain.daiin 5
– dain.dain 2
Currier B pages:
– qokedy.qokedy 11
– daiin.daiin 6
– qotedy.qotedy 3
– dain.daiin 2
– dain.dain 2
– qokedy.qotedy 2
– daiin.dain 1
– qotedy.qokedy 1
As I recall, this is one of several reasons why it seems that Currier B developed out of Currier A (and not vice versa). I recall Glen Claston being very sure about the order in which sections were written, but I don’t remember the precise context. All the same, it’s probably findable on Cipher Mysteries if anyone is interested.
Nick: I am aware of your multi-digit Arabic numbers idea.
The key thing that one finds with the repeated labels spread across very different drawings is that they are of the form:
okol
otol
okor
otor
and many other similar spellings such as those not more than 5 characters long and starting with an “o” and followed by a “k” or a “t”
I need to make a systematic list of all these repeated label word variants and every instance amongst the labels where it is to be found. And importantly study the many words with distinctive spellings(not conforming to the “ok” and “ot” patterns) that are not repeated.
If one were to argue that these words were another numbering system in addition to the ones you argue for in Currier A & B then what is the purpose of this labelese number system? Somehow one would assume that an isolated number could have a significance in and of itself or instead somehow maps to a word. I have already discussed the figure number hypothesis i.e. “Figure 2” and how references only many sense if you can find that reference elsewhere.
So I think as with all other null free explanations it seems to raise many more questions than it answers. With enough metaphorical gears I am sure we could cobble together some very complex hypothesis that would seemingly explain this problem away. I am sure that I can with enough effort come up with many very complex ideas that would seem to explain this phenomena, however I think the null explanation remains the most plausible.
Of course if these repeated labels are nulls then how much effort would be required in generating from imagination or by a mechanism these null words. At some stage I may attempt such an experiment though I doubt a cardan grille will come into play.
I suppose we could view the word “null” here as meaning “containing no information” and likewise obviously “non-null” as meaning “containing information”. On the face of it monotonous words would seem to contain less information than distinctive words.
Nick: I think saying definitively that any word is a null is very hard thing, certainly without a complete decipherment. The question is how do we expect a null word would behave in text and do the cluster of label words with very similar spellings fit that behaviour? I suppose we could ask as a useful analogy how we would expect a null character to behave in text? It seems the arbitrary placement of the null would be an important indication. I think demonstrating a contrast between the behaviour of the null and the other characters thought not to be nulls would be important.
So whilst we can’t say that a character or a word is null without the context of a full decipherment of the text we can say whether its behaviour better fits the behaviour of a null than the behaviour of any other clearly defined character or word. This brings me back to the idea that these label words are most likely to be nulls.
If there is at least one null in the Voynich then I think it fits the definition of a cipher, not a language in an unknown script, so this is an important implication. If one can identify at least one null word in the Voynich it raises the question of how widespread they may be.
Nick: I think there is a strong case for preferring, initially, careful inspection of the labels with the actual drawings rather than relying wholly on statistical tests. The application of statistics to the analysis of labelese can easily miss things like the fact that the drawings, that in some cases the very same label are attached to, are so different. It also is more difficult to easily aggregate in one useful statistic phenomena like the very similar spellings of the repeated labels and that these are more likely to appear as repeated words in sentence text, as well as factors like the proximity of labels with one letter difference in their spellings. I think for the time being the strongest case is for looking at all the labels carefully before designing tests. Especially as any tests designed for labelese might easily be extended to voynichese as a whole.
Mark: you must remember that, to date, the Voynich Manuscript’s labels have been the rich soil out of which most of the nuttiest theories have grown. Whether or not you think my suggestion about abbreviations and verbose cipher has any merit, the fact remains Voynichese has layers of subtleties that can cause any faulty assumptions or presumptions to throw you many, many miles off course.
Mark, are you aware that you base your conclusion that there are null words on studies that you still need to do?
For the last N months you have been saying things like: “I need to make a systematic list of all these repeated label word variants and every instance amongst the labels where it is to be found. ” (This is from today’s mail).
Maybe go ahead and do that.
And then draw the conclusion.
Mark : all characters are important. You must not miss any character. All characters are numbers. There is a deceptive character in the text.
It’s one character !!
( not the whole word, but only one character ).
Otherwise look at the large parchment. ( Rosettes ).
Several words are written at the top left.
If you can read it, you’ll find out who the author the manuscript is . 🙂
Nick: That’s interesting. Which nutty theories are you thinking of as having originated out of labels?(I would be interested in hearing of other theories, nutty or not, unless you mean the “this label corresponds to this word” crib kind.) As you know I have been interested in verbose ciphers and proposed my own, it is only as a result of looking at labels that has made me question whether this level of complexity is actually necessary. I ask myself questions like if it is possible that we could have something as simple as a very basic substitution cipher with a large proportion of null words. If so then the large proportion of nulls would completely skew all statistics carried out on the text as a whole, rendering frequency analysis, for example, completely useless. In fact if we had 60% or more null text(from null words) then I imagine that so much of the analysis of text carried out so far may have limited value. Most statistical analysis would have to be applied to all null text or all non-null text, but tests on the standard blend of both found with the complete text really would provide useful information on neither one nor the other. I am not saying the underlying cipher is just a simple substitution cipher with a large proportion of null words, but that as far as I can see that is not an impossible explanation. Arguing that there are a large proportion of null words of course says nothing about the behaviour of non-null words, the only way to do this would be to a large extent disaggregate the two types of words. Yes this is speculation, but it is where the direction of my thinking is pointing at the moment.
Mark: you shouldn’t have to check out many Voynich theories to find ones based on labels. Brumbaugh’s pepper springs readily to mind, and even Leonell Strong’s insistence that the first (normally gallows-initial) word of each Herbal page was an enciphered plant name probably falls into the same category. Bax’s juniper falls into this latter category, as does any of the (now many) theories that rely on EVA doary being Taurus.
And so on. 🙁
Rene: I have had a limited amount of time recently and doing the thorough study I want to do will take quite a lot of time. I have been exploring this idea. If I do the study that I intend to regardless of what I find then I think you could still say that I could not draw that conclusion. In fact even if I were able to decipher all the text that I consider non-null then you could still ask how I can be sure the other text is really null? Proving definitively that any filler text is meaningless in any cipher is impossible, it seems to me, yet it is still a worthy hypothesis and even a practical conclusion. There is little that one can say that one can draw a clear conclusion on at this time when it comes to the Voynich.
Nick: But that was the point I made. From what my recollection of the use of labels in the theories that you cite they are of the form
voynichese label word -> real discernable picture -> real word
Such as “taurus” or “pepper”
These it seems are completely different kinds of theories from what I have proposed. They are essentially crib theories.
Cheshire’s “proto-Romance” theory is based almost entirely on labels on the “rosettes” folio. He only gave a few cursory mentions of translations in other parts of the text and those were questionable because he would do things like take a block from part of the main text and match it with a drawing elsewhere.
Mark: yours is essentially a crib theory for nulls, along the lines of “that label can’t possibly be a real word in the kind of system I envisage Voynichese writing having, so it must therefore be a null”.
JKP: That is not correct. Cheshire formulated his theory before he even looked at the 9 rosette page; I know as he emailed me on the subject. His latest paper justifying his method makes no reference to the rosettes page. So your characterisation of the basis for his theory is way off base.
Nick: These crib theories are based on the theory that one can identify a drawing, so for example, this drawing looks like the pepper plant or taurus or whatever it may be. In so far as I make a visual identification, I say:
This plant doesn’t look like that star, that nymph, that pipe and that 9 rosette page feature all together etc.
Mark, I can’t recommend trying to predict what other people will say or think about something that hasn’t been produced yet. A similar discussion is going on at the Ninja forum.
The problem is that now we have nothing to go by. I can’t see the logic from:
1) Some label words near different drawings are the same
2) Therefore they are meaningless.
For me this is a non-sequitur.
If there are null words, it would have been entirely possible to make them all different.
It the labels are numbers, then plant number 75 and star number 75 refer to two different things.
So you see, all possible combinations of null / meaningfull and “some duplications” / “no duplications” are sensible.
Now you made it clear that you prefer one particular explanation, but this is then an opinion which is based on a hunch, not on any logic.
Mark: the only reason you can draw the inferences you draw is that you also make a ton of assumptions about the way the language works.
Nick: What do you think those assumptions are?
Mark wrote: “Cheshire formulated his theory before he even looked at the 9 rosette page; I know as he emailed me on the subject. His latest paper justifying his method makes no reference to the rosettes page. So your characterisation of the basis for his theory is way off base.”
Cheshire’s theory that it’s an extinct “proto-Romance” language may have been the result of a “thought experiment” as he claims, but I am doubtful that he formulated his transliteration system (which is the predominant part of his theory) without looking at the rosettes folio. It was the focal point of his pre-print and peer-reviewed papers. He hasn’t done any substantive translations of the main text.
“Within the manuscript there is a foldout pictorial map that provides the necessary information to date and locate the origin of the manuscripts. It tells the adventurous, and rather inspiring, story of a rescue mission, by ship, to save the victims of a volcanic eruption in the Tyrrhenian Sea that began…”, then he goes on to give his translation of the labels.
.
Whether or not he mentions labels in the June paper is irrelevant. The follow-up paper is thin—mainly a series of charts illustrating his substitution system. In the commentary he still doesn’t seem to realize there is a gap of centuries between the development of Romance languages and the medieval period or that “proto-Romance” isn’t “a thing”. He calls it the logical link between spoken Latin and the Romance languages without acknowledging that this “thing” was occurring 1,000 years before the VMS was created and his contention that it lasted into the medieval period has not been accepted by linguists and historians.
He claims that the text of the VMS is written in a “language and writing system that were in normal and everyday use for their time”. If the writing system was in normal and everyday use in the 15th century, why is there no trace of it and why is it ungrammatical?
He claims that the writing system (of the VMS) “can be apprehended” [sic] once the grammatical rules are understood, but his “solution” offers no grammar to speak of, just a method for subjectively breaking up the tokens to try to make them into words.
Mark: if you’re so confident that you can identify non-words, you must surely have a nuanced and well-developed idea of what words are. Or else your whole idea would simply be hot air.
Statistically, it is entirely possible that non-words might be identifiable if inserted in a normal text by looking at their immediate context (both forward and backward). Indeed, one statistical test for null words might be to see if the preceding word is a better predictor of the following word than the word in the middle.
However, this would also flag up common words such as ‘very’ which can be inserted before most adjectives. So this may not be (very) useful. 🙁
JKP: You say->
“Cheshire’s theory that it’s an extinct ‘proto-Romance’ language may have been the result of a ‘thought experiment’ as he claims, but I am doubtful that he formulated his transliteration system (which is the predominant part of his theory) without looking at the rosettes folio. It was the focal point of his pre-print and peer-reviewed papers. He hasn’t done any substantive translations of the main text.”
You may doubt it, but it is a fact that he didn’t. I can provide proof if you need it. He only started looking at the rosettes folio, which was unfamiliar to him, subsequent to a discussion with me and a test on that very folio which I gave him. His 2nd paper was on the Rosettes folio, but his first paper didn’t mention it. I am obviously not going to defend his theory, but his last paper focused on the translation of a specific paper of the herbal with no reference to the rosettes folio.
I have not made and am not making a case for his theory, but rather concretely disputing that his theory was developed on the basis of a study of labels.
Nick: Firstly I would consider the thought experiment that as a result of some calamity, say a fire, all that had survived of the Voynich when it was discovered by Wilfred was the pages with single “word” labels on and any non-single “word” label text on those pages was stained or burnt, so as to be completely unreadable. By that I mean of course that we only have had labels to work with to decipher Voynichese. So then we could ask what we can deduce from the greatly reduced Voynich manuscript. You may say correctly by doing, so we are restricting ourselves from making use of lots of other useful information, but as I say this is a thought experiment.
So in the context of labelese a word can be defined as a series of characters not punctuated by a space. Given that these labels are isolated and only in close proximity to one specific drawing it seems not unreasonable to suspect that these “words” are related to that specific image if they have any meaning at all. It has been suggest that despite these “words” being separated by a significant amount of space they could potentially have been strung together to form sentences However having looked at the drawings in a large number of cases, especially the rosettes folio, how one could do that seems virtually impossible.
So these “words” if they have a meaning I.e. are not meaningless and therefore null words. We express meaning with words from languages. So even if somehow the single “word” actually had hidden spaces and therefore mapped to multiple real words the issue of strings of characters being repeated in different parts of the label sections of the manuscript remain. So if you don’t like the term “null word” one could use the term “meaningless character string” and instead of the term “non-null word” one could use the term “meaningful character string”. But this is largely semantics as the situation and conclusions largely remain the same.
If you view the label word as really a reference to text elsewhere in the manuscript such as “figure 2″(Without a reference to elsewhere in the manuscript it would be superfluous and therefore in practice meaningless.) then one has to develop a theory as to where it maps to i.e. sentence text on the page or the large body of text as a glossary at the end of the manuscript.
I have made some observations and looked at all the alternatives theories which could explain them, either as suggested by others, or thought up by myself and looked at how each stacks up.
Mark: when you say things like “how one could do that seems virtually impossible”, you are clearly making use of a whole raft of assumptions about how you think the Voynichese language works, and then combining those with a further raft of assumptions about how you think the labels must work.
Thought experiments like that really aren’t going to help you get past the many, many assumptions you are making.
Mark: as far as Cheshire goes, I am rather more inclined to believe JKP’s reading of the papers Cheshire put out rather than what he said to you long after the event when you asked him. His description to you seems quite at odds with his earlier writings.
Nick: Well if you can come up with a plausible scheme to string all the disparate words in the rosettes folio into one contiguous block of text and then explain how that pertains to the images on the page then I would be absolutely flabbergasted.
Mark: now you’re starting to bring your assumptions to the surface, e.g. (a) it has to be a continuous block of text, and (b) it has to pertain to the images on the page. Though I suspect you still have 5 or 6 others you’re treating as constraints (and which also aren’t necessarily true or helpful).
Mark: “You may doubt it, but it is a fact that he didn’t. I can provide proof if you need it. “
Mark, you don’t have to give me proof, but a date would be helpful. It’s not that I doubt you, I doubt him. He changes his story every few weeks. So if you have a date as to when you had this conversation or correspondence with him, I would appreciate it.
Nick: Fundamentally the basic problem of how on earth you interconnect the different isolated rosette words remains and if they are not contiguous or do not relate to the images then how any meaningful sense can be made of them.
Yes, I make assumptions. For example, I assume that the text is not merely the product of numerous ant trails. Anyone is welcome to question an assumption they believe I have made and think unjustified. The key remains which theory best fits the observations, on this ground other explanations that I have heard so far are sadly lacking. I am not claiming and have not claimed that I have proved this assertion, but rather that this hypothesis fits much better the observed facts still than any proposed so far. So one can certainly present the hypothesis that the words on the rosettes folio are strung together by some as yet unexplained scheme, but that theory for many reasons I think is vastly less plausible than the explanation based on the presence of null words. If I thought that theory was the best explanation then I would adopt it.
Nick & JKP: I have checked my email correspondence with Gerard. His first paper dates from November 2017 and probably earlier. He became away of the rosettes folio in December of 2017; that is very apparent from our correspondence. Now it is possible that this correspondence with me was all an elaborate ruse, to what purpose it is hard to imagine, to disguise the fact that he had already seen the rosettes folio, but this seems very unlikely. Of course here again I am making all sorts of assumptions. I am assuming that Gerard is a real person though I have never met him. I am assuming that he is not a foreign agent intent on disrupting the western media by hawking his theory. I am assuming that he did not see the rosette folio as a child and that subliminally even though he cannot consciously remember seeing the folio it influenced his theory.
Nevertheless despite these alternatives I am certain that the rosettes folio had no influence on his first paper in which he expounded his translation theory.
Nick: The observations for daiin/dain and qokedy/qotedy are typical for the VMS. It is possible to describe similar observations for other frequently used word pairs like chol/chor, chol/shol, qokeedy/qokedy, qokeey/qokey, and chedy/shedy:
Currier A pages:
– chol.chol 17
– chor.chol 9
– chol.chor 7
– chol.shol 6
– chor.chor 6
– shol.chol 4
– shol.shol 4
There is even a sequence shol.chol.chol.shol on page f42r.
Currier B pages:
– qokeedy.qokeedy 17
– qokeedy.qokedy 11
– qokedy.qokeedy 12
– qokeey.qokeey 12
– chedy.chedy 11
– shedy.chedy 7
– shedy.shedy 8
– qokey.qokeey 4
– chedy.shedy 3
– qokeey.qokey 3
– qokey.qokey 1
There is even a sequence qokeedy.qokeedy.qokedy.qokedy.qokeedy on page f75r.
The general rule for the VMS is: As more similar two words are, the more likely these words also can be found written in close vicinity (see Timm & Schinner 2019). When we look at the three most frequent words on each page, for more than half of the pages two of the three will differ in only one detail.
Rene: Yes, I have seen the Voynich Ninja discussion. Whilst, TTR definitely has some value I think its value is quite limited in the context of labelese.
In some ways you are in a difficult position as I have not documented all my observations yet, but rather made a mental note and got an overall sense.
When you say “Some label words near different drawings are the same”. It is the extent of it that is important and also that those label words all have very similar spellings whereas other non-repeating label words have quite different spellings.
Most importantly any explanation of labelese and this voynichese must account for this phenomenon. I have asked myself what the most plausible explanation I can think of is and I have also investigated suggested alternative explanations provided by others and they all present significant problems. The only explanation I can find that holds up and consistent with the evidence is that these constitute a large cluster of similarly spelled null words. Now it possible that there is better explanation, but I haven’t seen it.
I think making all null words different presents a problem as one would have to have a long list of distinct null words in the cipher key. By making null words conform to a formula or a small set of formulae makes them is to identify and easy implement. So it would have been possible to make them all different as you say, but very inconvenient unless one only had a short list of null words, which would present other problems.
You say: “If the labels are numbers, then plant number 75 and star number 75 refer to two different things.”
I agree. However one then has to ask what those numbers might refer to. Now as you suggest they could be something like “figure 75”, but then one has to present an argument as to where in the text this refers to. If they are numbers then one has to ask why some pictures are numbered and other neighbouring pictures are numbered if one argues that these similar words are part of a number scheme. Then it poses the question if they are not references, but quantities then how those quantities are to be interpreted. If one star or one plant has 75 next to it and others don’t have numbers next to it then what does that mean? It seems to me that once we go down the “numbers” interpretation rabbit hole we are presented with more problems with that theory to resolve than we started with.
I have that I believe we have a combination of meaningful and null words. I think it is perfectly possible to have some duplication, but doubt that we have them to the extent that we see with labels. I, personally, doubt that there are no duplications.
I think one hypothesis appears significantly more plausible than the alternative hypothesis. Whether one calls it a “theory”, “hypothesis”, “idea” or a “hunch” feels like semantics.
I have based it on the logic that we have observable fact from looking at labelese and then there must be an explanation for those observable facts. Then I have considered all the hypotheses that I or anyone else, as far as I know, has suggested to explain this and evaluated the strength of each hypothesis. There is one hypothesis that I think is by far the stronger. This is certainly not a proof, but I think it is a reasonable basis from which to pursue a research programme. If I become aware of a better hypothesis then I will change my line of thinking accordingly.
Mark: sorry to labour a really basic point, but your starting position is “that we have observable fact from looking at labelese and then there must be an explanation for those observable facts”, which you then combine with a whole load of assumptions about what labels are, what Voynichese is, how Voynichese works, how Voynichese is connected to the labels, etc etc.
It is only through combining your assumptions and presumptions with your observations that you are able to make some kind of logical inference about what must or must not be true. Until such time as you gain sufficient clarity of mind to tell all the components of your logical argument apart, you’re unlikely to get a very warm reception from other Voynich researchers, because this is exactly what they have to do.
Torsten: That is consistent with my observations with labels, but interesting none the less.
If we do have null words then I doubt that they were generated using some kind of cardan grille or other random mechanism(rolling dice or tossing coins or spinning roulette wheels etc.). This occurred from looking at neighbouring labels with similar spellings in the way that Torsten describes elsewhere. This fits more closely with an ad hoc assigning of null words as there was probably an understandable, yet natural laziness on the part of the author, which lead him on occasion to being less imaginative when coming up with the next null words and so opting for a word with a very similar spelling to the last one. Nevertheless it does feel like a rather arduous process overall for the author having to decide on the next null word to use within the framework of his null word formulae without a mechanism to aid him, yet that seems the most likely explanation to me. But of course this is speculation.
Torsten: thanks for that! 🙂
Personally, I’m always bemused by the degree to which Voynichese resembles a kind of random stream, yet one with in-page regularities (which you attribute to autocopying, and which Glen Claston attributed to some kind of ‘shift’ in the word and letter patterns that happens every few paragraphs, but which he was never able to satisfactorily quantify).
And yet I think Voynichese shows signs that its word to word adjacency isn’t random. For example, in the counts you quote:
* in Currier A: n(chol) = 531 and n(chor) = 400, so if n(chol.chol) = 17, we might predict n(chol.chor) to be about 17 x 400 / 530 = 12.8, but in fact n(chol.chor) = 6.
* in Currier B: n(qokeedy) = 303 and n(qokedy) = 268, so if n(qokeedy.qokeedy) = 17, we might predict n(qokeedy.qokedy) to be about 17 * 268 / 302 = 15, but n(qokeedy.qokedy) = 11.
Are these kinds of differences significant, or just statistical noise?
Nick: There is nothing random in the VMS. Everybody describes repeated and similar words but assumes that the observed curious text patterns are just some exceptions (see for instance http://ciphermysteries.com/2009/10/14/voynich-f42r-chol-shol-mystery). But they are not exceptions. The VMS is just a manuscript full of repeated and similar words. It’s just that simple.
Torsten: What you have said sounds eminently sensible. The question is do all words fall into the group of repeated and similar words?
The idea of the author employing autocopying fits neatly with the notion of filler text.
Torsten: to a large degree, all manuscripts are full of repeated and similar words, it’s just that the Voynich Manuscript’s words repeat and are similar in ways we don’t really expect. 🙂
As far as the autocopying hypothesis goes, my opinion is that while it is successful at explaining away a lot of the repeats and similarities we see in the surface behaviour, it doesn’t (as far as I know) even begin to explain why all these pages of Voynichese exhibit such a strongly polarized contact table in the first place, e.g.
ai 6475
ar 3151
al 3002
am 774
an 118
ak 36
ae 12
ao 11
at 11
ap 6
ach 6
af 3
ash 3
Here, everything from ‘ak’ onwards looks to me like scribal miscopying noise (even the ‘an’ set looks a lot like a miscopied ‘ain’ group to me), as if the only ‘aX’ pairs allowed are ai, ar, al, and am. But (unless I’ve misunderstood) anything like this aX adjacency property is something that the autocopying hypothesis shunts out to a lemma, explaining such behaviour away as a property of the original (but unspecified) pages where autocopies of Currier A and Currier B were ultimately seeded from. The same would seem to be true for EVA qo, etc.
My preferred explanation for these kinds of structures is that some kind of rigid verbose cipher is at play, so that only ai[i][i]n, ai[i]i[]r, ar, al, and am are valid letter groupings. But I’m reasonably sure that that kind of contact rigidity would be incompatible with autocopying, so perhaps we’ll have to agree to differ there. 🙂
As far as the chol-shol-mystery post from 2009 goes, I must admit to having forgotten that completely. But looking at the page again, I now wonder whether the unexpected abundance of chol (relative to random ch / random ol counts) might simply be a linguistic signal that it is enciphering a very common word (such as ‘and’ or ‘the’, things code-breakers typically look for, but which still remain resolutely invisible to our gaze in the Voynich). For example, for Latin: if ‘ch’ is E and ‘ol’ is T, ‘chol’ might well be enciphering ET, that might well skew the ch/ol overall stats in very close to the right way. It’s just a thought. 🙂
Nick: Even if we didn’t expect repeated words in such an unusual way the words are repeated this way. For instance page f15v not only starts with “poror”, it has “oror or” and “or or oro r” immediately above each other on the first two lines (see http://ciphermysteries.com/2013/07/29/sandra-and-woo-do-the-voynich-the-book-of-woo).
Words recursively copied from each other can only result in a polarized contact table, since a more frequently used word is copied more often (see Timm & Schinner 2019).
Even if ’an’ looks like a miscopied ‘ain’ to you both words are similar to each other (see for instance sequences like ‘ain.okan.sheainy.qokan.chan.aman‘ on page f111v or ‘aiin.olan.otan.otain.otain‘ on page f103v).
The rigid letter groupings are the result of some simple design rules (see http://voynich.tamagothi.de/2008/07/05/die-harmonie-der-glyphenfolgen/). Coarsely speaking, the shape of a glyph must be compatible with the shape of the previous one (see Timm & Schinner 2019, p. 10).
There is a gradual evolution of a single system from Currier A to Currier B (see Timm & Schinner 2019, p. 6).
Reading Cheshire’s papers again is a strange experience.
In the early-2017 papers, he devotes a paragraph to explaining how the VMS author confused two different species of Helianthus (sunflower, a plant unknown in Europe in the early 15th century). His transliteration tables give no consideration whatsoever to letter positions in Romance languages and many of his historical statements are highly questionable. Plus, he originally assumed the VMS had a “16th century origin” even though there wasn’t much evidence to support that idea in 2017.
In November 2017, he contacted numerous Voynich researchers (at least 5). I’m guessing this communication with other researchers might have been the impetus to revise his date to the 15th century.
So Mark, regarding our conversation…
As far as I can tell, his first published description of the rosettes folio was in January 2018. From that point on, it becomes the main point of subsequent papers that describe the Ischia volcano scenario. He calls it the “Tabula regio novem”. In his words:
“Tabula regio novem has proven to be the perfect tool in demonstrating that the manuscript uses the proto-Romance language and proto-Italic alphabet as proffered. It includes a wealth of visual clues and so many definitive marker words that the positive correlations are beyond reasonable doubt in scientific terms.”
He then goes on to transliterate and translate the labels around several of the rosettes in a variety of languages.
.
So, Mark, you are right. I could not find any mention of the rosettes labels until January 2018.
Now that I know that, I can’t help wondering if it was his conversation with you that focused his attention on the rosettes folio and if that was what caused the significant shift in his timeline and volcanic eruption theory.
Torsten: personally, I think the or’s on f15v give stronger evidence of verbose cipher than of autocopying. And your suggestion that “recursively copied” words “can only result in a polarized contact table” would make sense only if there were no errors (or ‘deviations from some invisible norm’), so I think this is a weak argument: you can’t summon the ghosts of both an invisible norm and evolution of an invisible norm.
The design rules you talk about are more interesting: but those are surely independent of the whole autocopying thing and not really evidence for autocopying, right?
Nick: We are bound to the facts even if we don’t like the outcome.
We can only describe the VMS as it is. On page f15v there is not only ’poror.or’ and ’chor.or.oro’, there is also ’otchor.chor.chor.ytchor’, ’ytchor.chor.ol’, ’otcholocthol.chol.chol’ and ’cthor.chol.chor’.
Your idea about deviations from some invisible norm doesn’t make any sense. It can’t be an error to modify a word if word modifcations are the basic principle.
Torsten: I think f15v supports both of our interpretations, though obviously opinions will differ as to which of the two it supports more. 😉
The “invisible norm” I mentioned (i.e. the silent palaeographic force that rigidly shapes letter contacts across the manuscript’s 200+ pages) is not an intrinsic feature of the autocopying part of your reading, but is a second half that somehow resists the mutating and evolving qualities implicit (I think) in that autocopying mechanism. If autocopying is based around local mutations, the question is: why do some parts mutate but other parts stay so rigid?
Your “simple design rules” try to provide an explanation for this inherent rigidity, but even though they are visually appealing, they aren’t always completely successful in practice. For example, the group “an/ar/al/am” initially seems to work well, but a closer look reveals that the instance statistics aren’t at all symmetric:
an 118 / ain 1675 / aiin 3742
ar 3151 / air 564 / aiir 112
al 3002 / ail 29 / aiil 12
am 774 / aim 50 / aiim 15
Why is ii so encouraged in aiin but so discouraged in aiir / aiil / aiim? What holds that usage pattern in place?
As I understand it, what you are claiming is that your “simple design rules” explain letter adjacency rigidity purely in terms of shape adjacency harmony, while your auto-copying hypothesis explains word similarities in terms of local / in-page word mutations. What I’m pointing out here is that I think I don’t think your proposed combination of simple design rules and autocopying is able to explain the usage statistics that fall between the two.
Mark, I’ll start by saying that cryptography and linguistics are not my field. My remarks are just from considering how drawings are variously annotated – term I prefer to ‘labeled’ because there’s no necessity a scrap of writing has to name any particular thing in a drawing.
Take the example of theories about the planets’ influence.
When you say, “This plant doesn’t look like that star, that nymph, that pipe and that 9 rosette page feature all together etc.” it wouldn’t be out of keeping with medieval thought to annotate all of them with (say) “Saturn”, or “Venus”.
After all, you can see that one shows a star, another includes (among other things) an anthropoform figure, and a third a plant. The whole manuscript – dimensions, size of script, incorporation of both text and plant-pictures on one page, not to mention the plant-pictures which I consider beautifully elegant in that sense of condensing information with minimum waste.
So why waste space duplicating the pictorial information in written information? I expect the ‘labels’ won’t be redundant but information-adds. (but see first sentence above).
Nick: No symmetric outcome is expected if recursion is in place. On the contrary, as more often a word occurs as more often it gets copied. As more often a word is copied as more often similar words where generated. As more often similar words are generated as more often they get copied and this way they can also result in further instances of the original word.
This is exactly what we see in the VMS. When we look at the three most frequent words on each page, for more than half of the pages two of the three will differ in only one detail. There are pages where ’daiin’ is paired with ’dain’, but also pages where it is frequently used together with ’aiin’, or ’saiin’ (see Timm & Schinner 2019, p. 3). Your idea to generate also ’aiir’, ’aiil’, and ’aiim’ for each time ’aiin’ is generated is not only unpractical it is also impossible to write every word variant to a page with limited space.
I heartily agree with this quote of Torsten’s
“We are bound to the facts even if we don’t like the outcome.”
JKP: You say:
“I can’t help wondering if it was his conversation with you that focused his attention on the rosettes folio and if that was what caused the significant shift in his timeline and volcanic eruption theory.”
I think that very very likely. I could say more, but neither of us agree with Gerard’s theory, so it would seem a bit of waste of time for us to discuss the genesis of a flawed theory.
What I wonder is what a human being might naturally be inclined to produce as filler text in this context. This could be studied as an experiment. Would one subconsciously develop biases or preferences to using certain filler words (“null words”), so to assume filler words would automatically appear at the same frequency, as after all they are equally meaningless, could well be a misunderstanding of human psychology. One might even develop biases to using certain sequences of filler words. To assume that a human being would naturally select filler words at random as a computer might is questionable. In general we need to be careful how we interpret word frequency or other statistics as being inconsistent with the output of null words from a human being, as a human being trying to produce meaningless filler text may do it in quite a different way to that a standard computer program might.
Torsten: are you arguing that the prototypical ‘seed’ page contained only al/ar/am/aiin, BUT all the other variants I listed are nothing more than autocopied local mutations, whose ‘family’ form is palaeographically constrained by shape adjacency harmony ‘rules’? In this scenario, the seed page’s al/am/ar/aiin usage is the starting point, while the autocopying would create the ‘tails’ down from the instance count ‘peaks’, like a kind of textual convolution ‘halo’.
OR: are you arguing that the high instance counts of al/am/ar/aiin arose purely as a result of repeated self-copying? I fail to even begin to see the statistical ‘strange attractor’ that you believe causes this extreme statistical asymmetry. In this scenario, the al/am/ar/aiin asymmetry is an effect of the auto-copying.
My point here is that you seem to have argued this causal arrow in both directions at different times: is the al/am/ar/aiin asymmetry a cause or an effect?
Finally, you say: “Your idea to generate also ’aiir’, ’aiil’, and ’aiim’ for each time ’aiin’ is generated is not only unpractical it is also impossible to write every word variant to a page with limited space“. I think this is a hollow argument. “Limited space” makes no sense at all in a 200+ page document: if there is evolution in the patterns (which there most certainly is), what was stopping aiil/aiim/aiir/an from not only evolving (they certainly appear) but taking hold somewhere? The shape harmony rules you rely on so centrally mean that aiir/aiil/aiim are all exactly as visually probable as aiin, and yet (as I understand it) there are no places where the (claimed) autocopying caused these to cluster.
To my eyes, autocopying seems like a mechanism which you describe as working in the ways you would like it to work, and yet mysteriously not in the ways that would be inconvenient for you. How could something as dumb as autocopying be so cleverly selective?
Nick: Please reread the paper. There is nothing said about a seed page and you obviously overread the part where we describe local repetition (see Timm & Schinner 2019).
A word generated by self-citation is at the same time the result of the text generation method as well as a possible source for generating new words. Therefore the text is in the first place repetitive (see https://github.com/TorstenTimm/SelfCitationTextgenerator#page-graphs). That you interpret this repetitive behaviour as cleverly selective is just an overinterpretation on your side.
There was simply no plan to write ’chor’ six times on page f15v (http://www.voynichese.com/#/exa:chor:forest-green/exa:chol:chartreuse/exa:cthor:teal/exa:shol:olive/f15v/0). It just happened that each generated word similar to ’chor’ increased the chance for an additional instance of ’chor’. It’s as easy as that.
It is indeed not easy to understand how powerfull recursion is (https://en.wikipedia.org/wiki/Recursion). The best way to check an algorithm with recusion involved is a simulation. Therefore we wrote a program to simulate the self-citation method and to analyze the statistic properties of a text generated this way (see Timm & Schinner 2019).
If you think that it is necessary to count every word you write and to use a draft for every page you write I would suggest that you test your method. Until now I assume that it is far easier to copy previously written words from the same page. Especially since this way it is possible to reproduce all the intriguing statistical key properties of the original VMS.
Torsten: so, should I disregard all you previously wrote about seed pages (it was your description, not mine) and take Timm & Schinner 2019 as a complete account of your ideas in itself?
What I said was overly selective was the way that you seem to rely on recursion when it suits your argument and rely on stroke adjacency harmony or evolution when it does not. That is, your claim now seems to be that all of Voynichese’s stats emerge naturally from recursion: and yet even though many of its stats evolve considerably over its 200+ pages, the lower-level letter adjacency ‘rules’ seem to stay constant. So it seems that you want to cherry pick those specific parts of recursion, palaeographic stroke harmony, and evolution that suit your argument. Hence “overly selective”.
Whether or not there was a ‘plan to write chor six times on page f15v’ is your inference, it is not an observation. Please stick to facts, not conclusions dressed up as facts.
I understand the power of recursion more than well enough (I have a maths degree & am a professional software engineer). But the point about recursion is that it starts from a seed, however small. And it is hard not to notice that Currier A and Currier B, despite their many evolutionary differences, do seem to have been cut from the same underlying cloth – the same structural seed. How was it that self-copying was able to change so many things and yet the underlying cloth remained unchanged?
As to your last paragraph (“If you think that it is necessary to count every word you write and to use a draft for every page you write…”), I have no idea what you are talking about. Whether it is easier to use Approach A to encipher a message or Approach B to create a simulacrum of a message seems a futile comparison, one that helps nobody: “ease” is not a useful metric here.
Nick: The seeding part is your idea. Back in 2017 you wrote “And that these near-universal adjacency rules that are in the ‘seeding’ part of the text” and my answer was “No, adjacency rules are not preserved from the seeding part” (see https://www.voynich.ninja/archive/index.php/thread-1385-11.html).
I already answered your point about Currier A and B: “There is a gradual evolution of a single system from Currier A to Currier B (see Timm & Schinner 2019, p. 6).” Have you read the reference? Currier A and Currier B are not independent from each other. Currier B is a developed form of Currier A!
Your further comments only try to misunderstand what I say. There is simply no contradiction between word frequencies and letter adjacency rules since this are two different things. Stroke adjacency harmony limits the available word pool but it didn’t explain word frequencies. f15v was just an example of repetition since we talk about repetition and you asked about “the high instance counts arose purely as a result of repeated self-copying”. It was also your point that “the statistics aren’t at all symmetric” and the only way to get symmetric statistics is to count the words in some way. So it is on you to demonstrate that it is possible to generate a text with symmetric word frequencies with some simple tools.
Torsten: thank you for the clarifications. I had forgotten that exchange on voynich.ninja.
Re-reading your 2019 article and looking at your source code, you term certain high-frequency pairs of glyphs (such as ‘ol’ and ‘dy’) as ‘ligatures’, and yet treat them in the same way as ‘o’ and ‘l’ (i.e. individual glyphs).
* Why was it necessary for you to treat these ‘ligatures’ in a special way in your paper?
* Why would the original person doing the auto-copying have treated them in a special way? (They are not consistently physically joined on the page, so are not actual ‘ligatures’.)
* Why are there so many manually added substitution cases in your code on github?
* Do you not suspect that the presence of so many enforced manual adjustments might be a result of overfitting on your part?
* When your analysis code is so intensely manually manipulated, how is it that you can describe your explanation as simple?
Nick: It’s great that you ask questions about implementation details of the text generator. But I didn’t think that a thread in your blog is the right place to discuss this questions. Nobody expects such a discussion in a thread about voynich-labelese and the comment section was not made to give detailed answers. Since it is also a new topic I would suggest to discuss it at voynich.ninja. There is already a thread about the paper (https://voynich.ninja/thread-2790.html).
Here’s an example:
Northern Italy, author known, known origin. One of 10 books where is known and accessible online. The others, either not yet digitized or not online.
Without being able to read it, it has a similar likeness, and has togetherness of words. Incidentally, the word “vallde” occurs, but here with two L written (f116).
It is neither real Latin nor Italian. Also a trace German is recognizable.
In this way I imagine the VM translated. Just read and remember the words.
https://www.facebook.com/photo.php?fbid=2331902917032349&set=gm.2225135070929615&type=3&theater&ifg=1
Torsten, Nick,
One reason Nick’s comments are so widely read is that one doesn’t have to join something to follow a discussion, and one doesn’t need permission to look back a few years and see what was said, and by whom about a topic.
At present, comments to Nick’s blog about things said in 2010-11 are still fascinating, but contemporary comments made to (e.g.) the second mailing list are effectively buried and gone.
Forums have their place, especially in the initial flourishing stage, but in the end stagnate as the members become “as one” and fewer stimulating discussions occur. Just the nature of the medium.
It would be different if there were such a thing as an open-forum website, but there isn’t. And I speak as one of the first two involved in setting up voynich.ninja. 🙂 The other person and I met, and talked this over, in comments to Nick’s blog. Matter of public record.
I think Nick’s comments are widely read because he posts interesting topics, has something to say, and says it well.
There are plenty of open blogs and forums that aren’t widely read. Openness doesn’t guarantee quality or relevance.
J.K. Petersen: I don’t necessarily agree that the topics I post on are all interesting :-p , but I do always try to leave readers genuinely knowing more at the end of a post than they did before. This is a benchmark few bloggers seem to worry about any more. 🙁
Once an example in Latin, for those who believe that the short words and word repeats in the VM are unique.
https://www.facebook.com/photo.php?fbid=2344743892414918&set=pcb.2257397211036734&type=3&theater&ifg=1
https://www.facebook.com/photo.php?fbid=2344744019081572&set=pcb.2257397211036734&type=3&theater&ifg=1
In his 2019 paper Torsten says:
“It, however, does not totally dismiss the steganography hypothesis, although compelling arguments against it do exist:”
“Why, e.g., would someone hide a genuine secret within an obviously “suspicious” book?”
Ask that to Johannes Trithemius. Why is filler text ever used?
“While ‘psychological’ arguments per se appear questionable, also the high regularities of the VMS text significantly limit the maximal amount of information possibly hidden within the ‘container’, virtually rendering it useless.”
I don’t know if Torsten has a percentage figure for that maximal amount. However how can we say that it virtually renders it useless without knowing what that message is?
I wonder what percentage of Trithemius books on magic was filler.
I wonder if anyone knows much about the history of the use of filler text from the early 15th century and before I.e. pre-trithemius. I don’t mean just the use of null characters.