11Jan 2009

Ukrainian Voynich theory…

In the good old days, we seemed to be in a “long boom”, blessed by an apparently unlimited supply of fringe Voynich theories, like so many babies’ socks effortlessly churned out by a deranged knitter. Oh yes, we’ve definitely seen plenty of knitters over the years. 🙂

But of late, it’s hard to avoid noticing that a Voynich theory drought has apparently taken hold. It’s not that all the good theories have already been nabbed: the nature of most Voynich theories is they are intrinsically bad but non-trivial to disprove, while simultaneously playing out a subtle wish-fulfilment role in the theorist’s personal psychodrama. A bit like an Action Man toy for intellectual introverts. 🙂

But why should this ‘silence of the flim-flams’ be happening now? My suspicion is that the VMs cultural meme has subtly drifted over the last few years into a kind of no man’s land. Whereas it used to be something for everybody, I think that the ‘analysis paralysis’ of the Wikipedia Voynich page has spread, virus-like, through mainstream culture: and that the VMs’ status as a wacky para-historical mystery has been displaced by a kind of diffused epistemological ennui, as if the very need to understand it is somehow misjudged – that it’s not that kind of girl.

However, here’s a tolerably recent Voynich theory I’d missed, courtesy of “Michael the friend of D.” (who appears to be from the Ukraine), first posted to sci.lang in 2007. By plucking characters from a rotating sequence of three lines, Michael is able to pluck out a single non-word (“gracieg”) from the VMs. Where less than three lines are available, he suggests that stuff is hidden (Trithemius-style) in every other word. Of course, he’s not actually using the VMs for this, but a cleaned-up page of VMs text from omniglot.com: which isn’t so very different from Gordon Rugg relying on the statistical properties of the transcription. On the bright side, Michael is at least self-aware enough to notice that that he’s probably falling into a trap. 🙂

Posted in: Voynich Theories

23 thoughts on “Ukrainian Voynich theory…”

Jarlve on January 15, 2017 at 10:44 am said:

Hey Nick,

I’ve tested a short piece of Voynich against a variety of languages (5-gram statistics) with my (homophonic) substitution solver AZdecrypt. Note that my solver is powerful enough to solve the second Beale cipher in a fraction of a second so there is no issue with the substitution process.

Ukrainian happens to be the top correlation by quite a bit. John Stojko’s transcription is nonsensical but perhaps there are structural properties in the VM that correlate with Ukrainian and people pick up on it.

“fachysykalarataiinSholShorycThresykorSholdysorycTharorykairchtaiinShararecTharcThardansyaiirShekyorykaiinShodcThoarycThesdaraiinsao’oiinoteeyoteorrolotycTh*ardaiinotaiinorokansairychearcThaiincPharcFhaiinydaraiShy”

AZdecrypt identify language for: vm1.txt
————————————————–
ukrainian(ukraine): 179.42%
greek(greece): 168.66%
tamil(mixed): 166.40%
belarusian(belarus): 163.02%
armenian(armenia): 162.87%
icelandic(iceland): 158.14%
hungarian(wikipedia): 156.68%
swedish(sweden): 156.41%
indonesian(indonesia): 155.64%
tatar(mixed): 154.52%
russian(russia): 154.40%
english(austria): 154.13%
albanian(albania): 154.03%
english(newzealand): 153.89%
english(canada): 153.79%
english(fiji): 153.37%
czech(czechrepublic): 153.20%
english(southafrica): 152.90%
english(unitedkingdom): 152.80%
indonesian(mixed): 151.91%
czech(europe): 151.69%
spanish(uruguay): 151.07%
russian(kazakhstan): 150.84%
english(europe): 150.43%
english(england): 149.68%
russian(azerbaijan): 149.54%
russian(moldavia): 149.22%
spanish(columbia): 149.16%
spanish(costarica): 149.12%
slovak(slovakia): 149.09%
spanish(honduras): 148.60%
danish(denmark): 148.09%
german(germania): 148.09%
spanish(guatemala): 148.00%
spanish(ecuador): 147.70%
slovenian(slovenia): 147.45%
austriangerman(austria): 147.44%
catalan(catalonia): 147.38%
spanish(wikipedia): 146.23%
german(switzerland): 146.16%
bulgarian(wikipedia): 146.14%
polish(wikipedia): 145.80%
italian(mixed): 145.21%
norwegianbokmal(norwegia): 144.63%
turkish(turkey): 143.83%
esperanto(mixed): 143.05%
croatian(wikipedia): 142.20%
portuguese(brazil): 141.60%
cebuano(cebuano): 141.58%
norwegian(norwegia): 139.12%
bosnian(bosniaherzegovina): 139.03%
dutch(netherlands): 138.64%
french(france): 138.21%
african(southafrica): 137.34%
arabic(wikipedia): 136.59%
portuguese(portugal): 136.26%
vietnamese(vietnam): 135.92%
azerbaijani(azerbaijan): 135.34%
persian(wikipedia): 135.07%
portuguese(wikipedia): 134.96%
lithuanian(lithuania): 133.32%
estonian(estonia): 133.11%
moldavian(moldavia): 132.87%
chinese(china): 132.74%
romanian(romania): 132.69%
finnish(finland): 132.56%
latvian(latvia): 132.49%
persian(iran): 132.22%
hebrew(wikipedia): 124.23%
Thomas F. Spande on January 15, 2017 at 4:06 pm said:

Dear jarvie. Interesting. Have you tried “Old” Latin and Classical Latin?

Cheers, Tom
Jarlve on January 15, 2017 at 5:41 pm said:

Hey Tom,

latin(wikipedia): 151.28%
nickpelling on January 15, 2017 at 5:46 pm said:

Jarlve: how about vowel-less Ukrainian? OK, I know full well that Voynichese simply isn’t that, but it would be interesting to see what the stats say. 🙂
Jarlve on January 15, 2017 at 6:55 pm said:

Hey Nick,

It would not be fair to run through vowelless Ukrainian without devowelling all the other languages as well, not a small task to say the least.

I will leave it as it is. Perhaps someone wants to pick up on it one day. I developed the language identificator mainly for the Zodiac ciphers and thought it would be interesting to see how it would look like for the VM.
Ruby Novacna on January 17, 2017 at 8:51 pm said:

And what do these percentages mean, please?
It confuses me completely, poor cryptologist, that I am.
Ruby
Az on January 18, 2017 at 7:19 am said:

i want to submit my theory how to attach a file or video? Nick do u have an email ad?
Jarlve on January 18, 2017 at 5:13 pm said:

Hey Ruby,

They represent the percentual difference between the average score of 50 randomized strings and the best score of the non-randomized string with 50 restarts.

This normalizes the languages versus eachother.
Thomas F. Spande on January 18, 2017 at 11:12 pm said:

Jarve, Would you be up to taking just one glyph (the inverted gamma) out of just one page of the VM botanical and then run it through your program and check it against old Latin?

Ever hopeful, Tom
Jarlve on January 19, 2017 at 4:14 pm said:

Hey Tom,

You are asking how well the relative positions of the “inverted gamma” symbol correlate with any given language? If yes, then it requires a different test than my language identificator.
Ruby Novacna on January 19, 2017 at 8:01 pm said:

Jarlve !
I dont have a mathematician mind and the percentages with values superior to 100 desorient me instead of informing me. Can you, please, interpret these results for us without using too complicated terms ?
Best regards
Ruby
Thomas F. Spande on January 20, 2017 at 4:58 am said:

Jarive, The inverted gamma is an earlier medieval representation of the currently used number “4”. I have observed there are two dozen or so per folio in the botanical section and that nearly all follow an “o” or “a” and seem to end a ‘word”. Also a lot of Latin words end in “o” or “a”. I was just curious if you stripped these out (way easier than stripping out all vowels), what your magic software would come up with when comparing the modified VM text with “Old Latin”

Cheers, Tom
Jarlve on January 20, 2017 at 4:37 pm said:

Ruby,

Say that we have this piece of information which is referred to as the non-randomized string:

“ninajoelenecontibornjuneisanenglishactresscomedianandventriloquist
amongmanycharactersherprimaryonstagepuppetsidekicksareawhiteha
iredscottishgrandmothernamedgrannyandadeadpanandsomewhatsini
stermonkeynamedmonk”

Note that it is a piece of English text without spaces (nina joelene conti born june is an english actress comedian…).

This string is then randomized and referred to as the randomized string (the letters have changed place randomly):

“hkoaecedmtsysneltberdagtoiuddndorktiniedjeeeriaaaydhoopnrqdser
nntpyiaannpeagglasnmiicnermsmruaidimcosnrsnsenteroaeriiniagann
aseaneloeaurotdekacosahiejsncvamgarmhoccakhdwmnnonennsptwn
tpiaerrehimmhaatthysntets”

Note that it no longer represents a piece of English text because of the positional randomization.

Now, both strings are scored with my substitution solver AZdecrypt using English 5-gram letter statistics (to explain these details would require another lengthy post).

AZdecrypt scores:
—————————————-
Non-randomized string: 23005
Randomized string: 16391

Now, divide the score of the non-randomized string over that of the randomized string and multiply by 100 to get the percentual value. In this case it is 140.35%. You cannot compare the percentage of this string to the VM string that I used because both strings have a different multiplicity.

My thoughts on the matter are that the VM string I used correlates well with most languages because of its repetitive nature and strangely correlates better with Ukrainian which may be a random coincidence of structural properties between the VM string and the Ukrainian language. Or, *somehow* the VM string has Ukrainian elements in it.
Jarlve on January 20, 2017 at 4:44 pm said:

Tom,

Can you supply me with such a VM plaintext (with your symbol removed) of about 200 characters in length and a large source of “Old Latin” at least 10 megabytes in size?
nickpelling on January 20, 2017 at 4:53 pm said:

Jarlve: I would add that Nina Conti is really funny, but that would be putting words in your mouth. 😉
Thomas F. Spande on January 20, 2017 at 6:10 pm said:

Jarive, I’ll have to get help from my son who has photoshop on his Mac. The plan would be to photograph a page and delete those “inverted gammas”. Also the “co” combinations will be replaced by “qu” (as in “old Latin. Thanks in advance for your willingness to try this little experiment. It will take some time to deliver a modified script as my son is only infrequently available with his Mac and software.

Cheers, Tom
Ruby Novacna on January 20, 2017 at 7:31 pm said:

Thank you Jarlve! I will make more effort to understand.
Ruby
Jarlve on January 20, 2017 at 7:34 pm said:

Hey Nick,

Hehehe, that’s what she does. 🙂
Jarlve on January 22, 2017 at 10:47 am said:

Tom,

The test requires your modified VM plaintext in a computer text format at least 200+ characters (not an image) AND a source of “Old Latin” in a computer text format at least 10 megabytes in size.

If supplying one of these is a problem then I wouldn’t bother.
Thomas F. Spande on January 22, 2017 at 4:42 pm said:

Jarive, 10 MB is a problem. In my opinion, the VM is a mix of old and new (called classical) Latin. “Old” Latin continued in use, often mixed with Classical Latin until the mid 19thC and the pure Old Latin was used until about 300 BC.

Too many problems at the moment. Thanks, anyway, for the outline of the requirements for your computer attack.

Cheers, Tom
Az on February 11, 2017 at 4:03 pm said:

https://youtu.be/YDnZJYnH93Q
M R Knowles on March 27, 2020 at 8:55 pm said:

Interesting. I should probably get to know Trithemius’s work better.
J.K. Petersen on March 28, 2020 at 12:52 am said:

Jarlve, when you compared Voynichese to Hebrew, was that Romanized Hebrew with the vowels added? If so, have you tried it with abjadic Hebrew (no Romanized vowels added)?