15Feb 2014

Mark Perakh and the Voynich Manuscript

I was deeply saddened this week to find out that Mark Perakh died last year, on 7th May 2013 in Escondido, Calfornia. He wrote with such vitality I never even stopped to consider his age: but he was in fact 88.

Perakh’s was a life of three professorial acts: first in Russia, then in Israel, and then finally in America. It seems that Perakh was goaded most frequently into action by a drive to resist that which he considered false knowledge – for him, dissenting sincerely meant fighting.

In recent decades, the things that goaded him to greatest action were the grand pseudoscience and pseudohistory constructions of fundamentalist Christian literalism: specifically, the Bible Codes (don’t get me started on that, or I’ll be typing all night) and literal Creationism. His book “Unintelligent Design” surely forms as good a sustained counterargument as needs to be written to the pro-creationist arguments of William Dembski et al.

Back in the world of cipher mysteries, for a short while Perakh brought his mathematical and statistical heavy guns to bear on the Voynich Manuscript’s confounding ‘Voynichese’ text: and his exemplary 1999 paper “APPLICATION OF THE LETTER SERIAL CORRELATION TEST TO THE VOYNICH MANUSCRIPT” is something I often suggest that researchers take a look at.

Unfortunately, since 2011 all the copies of it outside the Wayback Machine seem to have withered on the virtual vine: so I thought I’d take this opportunity to praise the man and resurrect his paper here on Cipher Mysteries, for anyone with an interest in statistical studies of the Voynich Manuscript.

So, here’s part 1 (his experimental tests and raw data) and part 2 (his conclusions): highly recommended stuff!

Incidentally, until just now I’d forgotten that Mark Perakh also ran his LSC (Letter Serial Correlation) tests on Gordon Rugg’s generated Voynichese-like text: and that it produced results that were close to those returned by the artificial gibberish text mentioned in Perakh’s paper, and quite unlike those yielded by Voynich A or B texts (which are very close to those characteristic of proper languages). In an online comment from 2004, Perakh expressed disappointment that Rugg had felt the need to gild his experimental lily for publication in Scientific American.

Posted in: Voynich Manuscript ⋅ Tagged: Mark Perakh

29 thoughts on “Mark Perakh and the Voynich Manuscript”

bdid1dr on February 15, 2014 at 9:03 pm said:

Nick & Stephen, I am sorry to hear of Mr. Perakh’s passing. It is so long ago that I read something of his, I can no longer reference the material.
I think he may have been very interested in the material I just downloaded from Wikipedia. You may have to X-reference this latest item (from me) to your Nahuatl discussion: The pictorial element is identified as ‘Codex Osuna Triple Alliance.JPG’ (single quote marks are mine)
Four lines of script comment on the pictorial elements portraying symbols for Texcoco, Tenochtitlan (Mexico), and Tlacopan. The script is preceded by that “bird-wing” glyph and a large ‘initial’ letter.
What is most exciting to me is the “bird-wings” and elaborate initial letter of the four lines of discussion. Whew!

ps: Do we have any means of contacting Mr. Perakh’s surviving family members with our condolences/tributes?
bdid1dr
Job on February 15, 2014 at 9:27 pm said:

Stephen, as someone who is on the sidelines of that debate – yet inclined to agree that the VM is meaningful – one weakness in the overall study of the “gibberish” hypothesis is that there is often a comparison with machine-randomized texts. E.g. in Mark’s paper:

Furthermore, these graphs clearly differ, in a substantial way, from those graphs we obtained for texts randomized by permuting letters of original meaningful texts

From which the following conclusion follows and which I would not challenge:
VMS is not a truly random collection of symbols.

On the other hand, human gibberish will almost certainly have higher entropy and exhibit some properties of natural language.

An overview of different techniques, beyond Rugg’s, along with the respective statistical properties and a comparison with those of the VM, would help summarize the whole debate.

I find Rugg’s Cardan Grille proposal highly unlikely, for various reasons. I’m more inclined to believe that the author of a non-sensical text would have approached the task unscientifically, with little planning (what’s to plan?).

The simpler, more readily available, wing-it approach, would certainly yield some patterns. I find it quite difficult and exhausting to manually generate low-entropy text.

If i were faced with that task i would almost certainly reuse and adapt previous words – e.g. take this gibberish and change a character on it. Some properties of the text fit this scenario:
1. There are many similar words that vary by a single character.
2. The more a word occurs, the more similar variants it has.
3. There are few repeated word sequences, despite all of the similarity in the text.

That said, there are other properties of the text that are not well aligned with this theory, for example:
1. The high occurrence of some suffixes. For example, “edy” in B folios – that would have been easy to avoid.
2. The ratio between unique words (vocabulary) and total words matches natural language texts. I would have expected to see a higher ratio.
3. The apparent word structure and affinity of some characters to occur at word/line start or end.

In any case, this is more or less why i would not yet dismiss the gibberish hypothesis – and not at all because of the Cardan Grille stuff.

BTW, the following page contains Mark Perakh’s own description of the LSC method, which i found helpful:
http://pandasthumb.org/archives/2005/03/letter-serial-c.html
Job on February 15, 2014 at 10:55 pm said:

In my previous post i meant to say that human gibberish will have lower entropy.
Job on February 16, 2014 at 11:37 am said:

Stephen, i enjoyed the video you posted on your website outlining a technique for breaking into the Voynich.

While any form of trial and error will lead to many dead ends, i believe that’s the type of careful work that eventually yields results for puzzles such as these.
Menno Knul on February 16, 2014 at 12:48 pm said:

Stephen,

Doesn’t do the same pay for encryption ? Why would anyone spent so much time, money, energy, etc. to encrypt a book, when its contents (herbarium. astrologium) are available in plain natural language around as well ? To hide the small portion of recipes alone it would not be necessary to encrypt the whole book, and besides who could read it ?
SirHubert on February 16, 2014 at 2:04 pm said:

Menno:

Stephen Bax addresses this in the article on his website (see above).

Your wider question, why encrypt a book (assuming that the VMs is indeed encrypted) remains a very good one. It has the very obvious shortcoming that even the person who enciphered it would find it very difficult to use.

One possible answer would be that the contents of the VMs actually come from particularly rare sources and so were thought, rightly or wrongly, to be exceptional – and possibly therefore valuable. I might, in those circumstances, encipher the whole book for safety. If you then wanted to buy a version of part or all of the text, I could decipher my copy and produce one for you – but if you broke in and stole the manuscript it would be useless to you. Pure speculation, but a possibility.

Equally, you might encipher it if you wanted to dress up an ordinary miscellany of medical and other (pseudo-)scientific texts as something terribly exciting and valuable. Of course, producing a hoax text could achieve the same end…

It’s an excellent question which is often ignored here, and I’m sorry not to have better or more conclusive answers.
Menno Knul on February 16, 2014 at 4:49 pm said:

Sir Hubert,

Another possibility is, that we actually deal with a nomenclatura and that Stephen Bax hit upon a part of the list of names from this nomenclatura in comparing some plants with initial words. A nomenclatura is no encryption, but a (scientific) systematical list of names e.g. of plants or chemical substances (usually in Latin). The nomenclatura of plants is the predecessor of taxonomy, introduced by Linnaeus.

The actual list of names in plain text has not been included in the VMS, but can be reconstructed along the lines of Stephen Bax.

Leaving from the special signs F.K.P.T and the ligatures cFh, cKh, cPh and cTh the nomenclature exists of eigth categories or four main categories and four subcategories. The categories are subdived by a limited number of prefixes preceding the special signs o-, qo-, etc. I am still working on it, but you may consult my website under wordanalysis (woordanalyse) on the Voynich pages. I left from the point, that almost all special signs are used without prefix in the first words of the descriptons and paragraphs as can be seen on the video by Stephen Bax as well.
bdid1dr on February 17, 2014 at 4:47 pm said:

Nick & Stephen, I hope to see sometime (soon) a “Compelling Compilation” of proven translations of the “Voynich” manuscript. Maybe even some discussion of the initial character which begins the species identification of every botanical item. I’m also hoping to see translations (latin, maybe) of the pharmaceutical recipes.
Do you know if Mr. Perakh ever discussed possible South American origins/text for B-408?
Nick, would the Boenicke rare books archivists object to a title for your next Compelling Press offering: “B-408–Mr. Voynich’s Discovery” (?)

😉
Job on February 17, 2014 at 9:11 pm said:

I agree that the work is more extensive and homogeneous than one would expect from a hoax.

For example, the particularly verbose, and densely packed, recipes section could have been omitted without impacting the manuscript’s credibility.

Similarly, a more diverse alphabet or unique writing system might have made the manuscript more appealing, without crossing the line into obvious hoax territory. Yet the author is constrained by a short alphabet which is used consistently throughout.

One property of the text that stands out is that words starting with EVA “p” or ending with EVA “m” occur primarily at the left and right edges of the text, such that words starting with “p” are line-starters and words ending with “m” are line-enders.

It’s particularly interesting in the case of “m” since it often terminates both wrapped and unwrapped lines of text. This apparent association between “line morphology” and “line contents” is something to think about.

If Nick’s spam filter didn’t block posts containing links so rigorously i would post an image that highlights the occurrence of words starting with “p” and ending with “m”, overlayed on top of the actual folios. You would see two columns of different colors at the right and left edges of the folios, with only a few scattered occurrences in the center.
bdid1dr on February 17, 2014 at 11:31 pm said:

Most often, Job, that elaborate, sometimes curlicued, “P” begins any discussion because the various loops & curliques & extensions of the ‘tail-end’ of the continuous line to behind the upright beginning post will indicate a word such as “Pre-pos-ter-ous” or “Per-pen-di-cu-lar” or “Per-ple-x-ing”.
Nick, could you maybe demonstrate “com-Pe-ll-ing” for ex-am-ple?
🙂
bdid1dr on February 17, 2014 at 11:53 pm said:

Or, per haps M-ar-k Per-akh’s name?
beady-eyed wonder: bd id 1 dr 😉
Job on February 18, 2014 at 9:36 am said:

Stephen, how do you interpret the fact that word-final/line-final EVA “m”s are typically preceded by an EVA “a”?

In a sense, it’s not “m” that’s a terminal marker, but “am”, unless there is a reason that the last words within m-terminated blocks should end with an “a” so often.

Also, what are the chances that sense blocks, as you put it, would align so well with the width of the folio?

There doesn’t seem to be too much effort by the author – and i’m focusing particularly on the recipes section btw – to force a block of text to fit within a line. So why would it be that, in some cases, every other line should end with an “m”?

If you take a look at folio 115v you’ll find that “m” is always the last character in a line, with no exceptions. It terminates 15 lines, and in 12 of these it is preceded by an “a”. Of the 15 lines 13 seem to wrap, while the other two end before reaching the right margin.
Job on February 18, 2014 at 9:55 am said:

To add some numbers, there are, in the manuscript, 5693 words terminating in an EVA “r”. Of these, 2646 terminate with “ar”, whereas 2256 terminate with “or”.

Similarly, there are 1061 words that terminate in “m”. Of these, 756 terminate in “am”, whereas 195 terminate in “om”.

If “m” were a terminal version of “r” then i would expect to see a higher occurrence of “om”, though it’s possible that “om” words are less likely to occur at the end of blocks due to language constraints.

I think your interpretation of “m” as a terminal “r” is plausible, though the apparently coincidental yet consistent occurrence of “m” at the very end of lines is difficult to understand.
bdid1dr on February 18, 2014 at 5:25 pm said:

Sentence endings: ‘9’ equals ‘geus’ or ‘ceas’.
Smaller ‘9’ equals ‘X’
Ampersand with an added downstroke equals ‘itus’ or ‘tius’ or ‘deus’ or ‘dios’ — and usually indicates the final syllable of a discussion.
I do wish Mr. Perakh could have lived to review our latest discussions! I wonder if he ever read historical fiction by Leon Uris.
bdid1dr on February 18, 2014 at 7:37 pm said:

Gentlepeople, I refer you to a wikipedia item, which clarifies a very important aspect of B-408:
File:Codex florentino 51.9.jpg
Summary: Page 51, of book IX from the Florentine codex (1575-1577) by fr Bernardino de Sahagun. Paper,
31.8 cm x 21 cm. Library Medicea Laurentiana, Florence.
Text in romanized Nahuatl (NAHUATL is NOT KNOWN to have been a WRITTEN language PRIOR to its ROMANIZATION).
Emphasis is mine, bdid1dr 🙂
bdid1dr on February 18, 2014 at 8:47 pm said:

First two words of commentary for the illustration of a friar preparing to place a plant in a newly dug hole in the garden:
insico cauitlatl: ‘dig a cavity’

the combinative “ui” would make the sound “v”: ca ui tl atl i — cavity.

I have not yet found in the “Voynich” manuscript a free-standing “T” alpha-character. I’m pretty sure that at least two (or three/four) regular contributors to Nick’s pages can read and translate Latin with much more facility than me-myself-and I.
A tout a l’heure!
Knox on February 18, 2014 at 9:18 pm said:

Re. Stephen Bax February 15, 2014 6:09 pm.
“So what I can’t understand is, if Perakh said this so long ago and so authoritatively, why does anyone still think the VM script could NOT be written in a meaningful language?”
I believe Perakh used “meaningful language” in the sense of “potentially meaningful language”. If not, he should have. Secondly, only the characteristics he included in his analysis were considered. Whatever EVA-i is, there are few instances in which EVA-m does not fit as a decorative substitute for EVA-in.
bdid1dr on February 19, 2014 at 5:10 pm said:

Gentlemen, I do understand that several of you, working with the EVA for quite a while now, can’t seem to get out of the EVA argumentation mode. Do I understand that Nick developed the EVA? Or was it Mary D’Imperio? How about Currier? Tiltman? How effective have your EVA ponderings been in understanding what what B-408 is saying/teaching?
Ralph, thanks for your “what’s up” in re my dialogues/contributions to Nick’s blog. Thanks to you too, Job, for your meticulous and courteous overview of my posts.
Also thank you for showing us where we could find enlarged prints of B-408’s pages! It has been a difficult process for me when trying to get a printout of an enlarged section of any of B-408’s folios in order to read and transcribe the script.
Yesterday, I began a word-for-word translation of the Romanized Nahuatl script which appears in a wiki file:
File:Codex florentino 51 9.jpg
I am now able to compare the words of that codex with the words in B-408. It is appearing that the entirety of B-408 is written in “Romanized Nahuatl”
🙂
Job on February 20, 2014 at 6:10 am said:

Stephen, i would agree that lines are significant units, but more due to the encoding process, or even language construction, rather than the underlying content.

BTW, what’s your opinion regarding “y” terminated words? There are roughly 15,400 words terminating in “y” – that’s about 40% of the estimated total number of words.

In folio 103r, in particular, over 57% of the words end in EVA “y” – it also contains a sequence of 12 consecutive words all terminating with “y”. This seems unnatural, even if we consider that abbreviations may have been used.
bdid1dr on February 20, 2014 at 5:34 pm said:

Nick, could you post the EVA chart one more time, here, so that I/we can compare its word formations/terminology with the script of B-408 and other very similar but more easily read manuscripts?
I noticed, and have mentioned, the lack of a free-standing ‘T’ (or ‘D’). Would you, Job, also be able to run frequency tests on the use of the ligatured ‘c-e’ or ‘e-c’ combinations? Would we then be able to recognize the ‘tl’ or ‘dl’ as being the tapped or trilled ‘dr’ ‘tr’ syllable and/or the word ending ‘tly’ (as in in-dig-nan-tly). Do we have EVA’s for various combinative syllables such as ‘var-ious’ ? Not too long ago, Job and I were discussing that pe-cu-li-ar script which looks like an ampersand with an upwardly extended ‘downstroke”. I am trying to be politically/religiously correct by giving my ‘take’ on that character as being either ‘tius’ or ‘dios’.
Job on February 21, 2014 at 1:14 am said:

bdid1dr, EVA was developed by René Zandbergen and Gabriel Landini and is suitable for many types of analysis, as well as a reference alphabet.

I’ve found that EVA transcriptions have a low error rate – i would estimate it at less than 1%, though certainly non-zero – and are fairly consistent.

While the decision to represent blocks such “iiiv” using “i” and “v” symbols is questionable, i find it acceptable because it does not discard information.

IMO a more problematic feature of EVA is the use of “sh” to represent “ch” blocks that have what looks like some form of diacritic – this does potentially introduce ambiguity because “s” is possibly an unrelated character, so it’s an implicit assumption that may produce biased results.

Additionally, some word boundaries in EVA transcriptions are questionable, though this is probably on par with other transcriptions.

That said, a transcription alphabet is only adequate or inadequate in the context of a particular application, so IMO it’s not worth discussing the merits of each transcription alphabet independently of how they may be used.
bdid1dr on February 22, 2014 at 11:46 pm said:

The word ‘abecedary’: Can any of you write that word using either the EVA or the Vms-A? I can — when using B-408’s
ab-e-c-e-d-R-e. Hence my translations of some twenty-odd folios of Boenicke-manuscript 408 (not all of the translations deal with only p-o-tl-an-cl esp-e-c-es).
The ‘Nine Rosettes” folio was my first. Then the “Coprinicae Mushroom” versus the “Alcohol Inky” . Next was f-116-Monumentum Ancyranum discussion (which may have been Busbecq’s note, written some 100 years after the manuscript’s drafting). Since my translation of B-408 f86r, I have attempted to contact Rene Z on several occasions concerning folio 86r3. So far, no feedback or acknowledgement. I have progressed through some of the more confused botanical studies. My favorite is B-408, f-16r:
“Psyllium seed” plant (plantago ovata). 😉
bdid1dr on February 23, 2014 at 12:12 am said:

Another favorite of mine: B-408, f-33v, is “Scabiosa caucasica” Several significant translations: line 1: cor-oll-as-aes-am (a curialim), line 2: telecaeseus aesa….”tellus” “of the soil”….. Further lines of script are discussing ‘summer heat and rashes’, and ‘to wash out’ – ‘as often as necessary’ – with ‘caliducaum’ — warm water.
So: The botanical specimen is “Scabiosa Caucasica” – the nickname for this plant is “The Pincushion Plant” (you will not find this nickname anywhere in B-408, f-33v). You can read other more modern instructions for the treatment of human ‘scabies’ mite, or for treating mange (dogs, cats, sheep,….).
bdid1dr on February 24, 2014 at 10:47 pm said:

Have any of you visited the museum which has (online) a printed copy of the manuscript/book written by Ububchasym Baldash: The “Tacuinum Sanitatis” . Lots of discussion accompanies each illustration: qualities: hot/cold, wet/dry, and uses/applications of each specimen.
If you haven’t, you may find a small, fully illustrated section, published by Rizzoli, with commentary by Adalberto Pazzini and Emma Pirani. The captioning for each specimen is a reproduction of Baldach’s writing.
bdid1dr on February 24, 2014 at 10:54 pm said:

Most interesting to me was Baldach’s spelling of the word ‘salvia’: Sal ui a (The ‘u’ and ‘i’ are spoken as ‘v’)
🙂
bdid1dr on February 28, 2014 at 8:54 pm said:

8 = aes
P = Sp-cies, PR-es-crip-tion, B-tlan (Eben Botlan?)
or Pl-an-tl-a-tl-a-tl-an : plantation (there is no stand-alone “T” or “D”)

I emphasize the importance of these two most-often used characters because they represent the closest the scribes were able to create a latin-based script of the spoken dialects. My earlier reference to the Florentine manuscript was made because I was able to determine “Romanized word ‘ca-ui-tl’ to translate to ‘ca-v-ty’ : “hole’ (dig a hole).

So, round ‘n round y’all go, rather than ‘dig’ what I am saying. “I dig you, man!” — is ‘hippy’ talk for “I understand” or “I agree”.
🙂
bdid1dr on March 1, 2014 at 5:34 pm said:

Correction: remove one of the repetitions of ‘tl -a’.
Dennis on March 6, 2014 at 1:40 am said:

I’m sorry to hear that Mark Perakh has died. He was a remarkably penetrating intellect.

In some book I read about
Alexander Solzhenitsyn, he has some critical remarks about Solzhenitsyn that were interesting. I’d always wanted to ask him.
Diane on April 2, 2015 at 7:40 pm said:

Seems to me that historical conundrums usually come down to a dichotomy between modern mind-set and that of the people or time concerned.

If I try to fit myself into an earlier (non-Renaissance) mind-set, and imagine how I’d go about enciphering something, the answer comes back either (a) a completely personal set of associations or (b) one of the mnemonic systems that I’d learned by heart. These at least avoid the silly image of a person having to sit down with pen and paper to read their own (or their inherited) practical little handbook.
So then you go the route of say, Hugh of St. Victor, or a simpler one something like the musical ‘hands’ – I rather like the idea of that one, and wish someone would work on it with me, tho’ I’ve sworn not to mess with the language side of things.
While the script is clearly written by a trained and long-practiced couple(?) of scribes, I really don’t see the need for anything so complex that it would need ‘deciphering’ of the sort imagined. How about something based on the abacus? Or indeed the “woven words” which in some case are believed to apply literally rather than figuratively.

More history, fewer computers sounds to me like the way to go.