After my last post proposing a possible link between the Silk Dress Cipher and Orphan Trains, I widened my search a little to take in 19th Century Baltimore orphanages. What kind of archival sources might still exist, in a town where 1,500 buildings were destroyed by fire in 1904?

But rather than look directly, I decided to instead first try to find any books or studies on 19th century Baltimore orphanages. And it turns out that (unless you know otherwise) there are only really two of those to consider…

Baltimore orphanages #1 – Marcy Kay Wilson

The first is “Dear Little Living Arguments”: Orphans and Other Poor Children, Their Families and Orphanages, Baltimore and Liverpool, 1840-1910, a freely downloadable 2009 dissertation by Marcy Kay Wilson at the University of Maryland:

The two Baltimore orphanages that I examine are the Home of the Friendless of Baltimore City (HOF), which was established in 1854, and the Baltimore Orphan Asylum (BOA). The latter was known as the Female Humane Association Charity School (FHACS) at the time of its incorporation in 1801. Six years later (1807), it was reincorporated as the Orphaline Charity School (OCS). It was renamed the Baltimore Female Orphan Asylum (BFOA) in 1826, and finally became known as the BOA in 1849. [pp.10-11]

Her primary sources for the Baltimore Orphan Asylum (in the Woodbourne Collection of the Maryland State Archives) include:
* Board Minutes (1881-1897, 1905-1921)
* Monthly Reports (1893-1917)
* Annual Reports (1860-1930)

For the Home of the Friendless of Baltimore City, the same Woodbourne Collection holds:
* Annual Reports (1854-1914)
* Constitution and By-Laws, 1859.
* Charter and By-Laws, revised 1904.
* Board Minutes (1901-1913)

Also (even though I’m not initially looking at Catholic orphanages):

The female-religious order known as the Oblate Sisters of Providence (OSP) granted me access to its records, which are housed at Our Lady of Mount Providence Convent in Baltimore. The OSP has the distinction of being the oldest Catholic religious order for African-American women in the United States, and was created in 1829. [p.13]

I’ve started to patiently work my way through its 402 pages, but I’ll be a little while. It turns out that orphanages sprung up all over America during the 19th century, initially triggered by the family-destroying ravages of cholera epidemics… so best not hold your breath, then. 🙂

Baltimore orphanages #2 – Nurith Zmora

Marcy Kay Wilson refers early on to Nurith Zmora’s Orphanages Reconsidered: Child Care Institutions in Progressive Era Baltimore (Philadelphia: Temple University Press, 1994).

Zmora used the records of the Samuel Ready School for Orphan Girls (which opened in 1887, and whose archives are in the special collections of the Langsdale Library at the University of Baltimore), the Hebrew Orphan Asylum (whose records are now held by the Maryland Jewish Historical Society), and the Dolan Children’s Aid Society (whose records are in the Children’s Bureau archive of the Associated Catholic Charities of Baltimore).

Though Wilson and (the far more revisionist, it has to be said) Zmora both offer fascinating insights into the social and political dynamics underpinning Baltimore’s orphanages, it’s hard not to conclude that their efforts sit somewhat at right-angles to our present angle: and it also has to be said that there is not a hint of the whole Orphan Trains narrative emerging from the various archives so far. But… perhaps this is all just the tip of an evidential iceberg. 😉

Other sources

There were a number of other books that kept coming up during my literature trawl, that I thought I ought to mention:

Second Home: Orphan Asylums and Poor Families in America by Timothy A. Hacsi (Harvard University Press).

Clement, Priscilla Ferguson, “Growing Pains: Children in the Industrial Age, 1850-1890”, New York: Twayne Publishers, 1997. [Wilson points to p.200]

Holt, Marilyn. “The Orphan Trains: Placing Out in America. Lincoln: University of Nebraska Press, 1992. [Wilson points to pp.80-117]

O’Connor, Stephen. “Orphan Trains: The Story of Charles Loring Brace and the Children He Saved and Failed”. New York: Houghton Mifflin Company, 2001.

Crooks, James B. “Politics and Progress; The Rise of Progressivism in Baltimore, 1895 to 1911”. Baton Rouge: Louisiana State University Press, 1968.

Since my recent post on the silk dress cipher, Jim Shilliday left an extremely helpful comment, in which he suggested specific readings for many of its various codewords.

So here’s a link to a Microsoft Word document containing a tabbed transcription of the Silk Dress Cipher.

The Locations

The first two columns contain a large number of codewords that seem almost certain to be American / Canadian place-names:

-----Sheet 1-----

Smith nostrum
Antonio rubric == San Antonio, Texas
Make Indpls == Indianapolis, Indiana
Spring wilderness
Vicksbg rough-rack == Vicksburg, Mississippi
Saints west
Leavwth merry == Leavenworth, Kansas
Cairo rural == Cairo, Illinois (or perhaps Cairo, Georgia?)
Missouri windy == Missouri / Chicago?
Elliott memorise == Elliot, Maine [though this is not hugely convincing]
Concordia mammon == Concordia, Kansas
Concordia merraccous == Concordia, Kansas / Americus?

-----Sheet 2-----

Bismark Omit == Bismarck, North Dakota
Paul Ramify == ?
Helena Onus == Helena, Montana
Green Bay == Green Bay, Wisconsin
Assin Onaga == Onaga, Kansas
Custer Down == Custer, South Dakota
Garry [Noun] Lentil = Gary, Indiana?
Minnedos [Noun] Jammy = Minnedosa, Manitoba
Calgarry Cuba == Calgary, Alberta / Cuba
Grit wrongful
Calgarry [Noun] Signor == Calgary, Alberta
Landing [Noun] Regina == Regina, Saskatchewan

I put all these locations onto Google Maps to see if any patterns emerged:

So… What Links These Places?

In a comment here, bdid1dr suggested that these towns might possibly be connected with the “Underground Railroad”, a route a large number of runaway slaves followed to get them from the South to Canada (where slavery was illegal). All the same, even though this is an interesting slice of American history, it is almost certainly not the explanation for the Silk Dress Cipher because (a) the dates are wrong (slavery had been made illegal in the US by the mid-1880s, and so the Underground Railroad was not still in operation), and (b) the locations are wrong (the Underground Railroad largely ran up the Eastern side of the US, quite different to the pattern we see here).

In a further comment, however, Jim Shilliday points instead to a quite different American history: the Orphan Trains. These ran from 1854 until as late as 1929, shifting East Coast orphans (though in fact a large number of them had one or even two parents) out to farms, many in the mid-West. What particularly triggered Jim’s memory was that (as he noted in his comment) “Concordia, Kansas (mentioned twice in the text) is the site of the National Orphan Train Complex, housed in a restored Union Pacific Railroad Depot“.

It is certainly striking that for a piece of paper found in Maryland, everywhere (apparently) listed seems to be a long way away: and that there appear to be three locations in a line in Kansas – Leavenworth, Onaga, and Concordia. (When I checked, all three had railroad stations: from Leavenworth Junction, trains ran to Onaga [rails laid 1877 by the Leavenworth, Kansas & Western Railway] and separately to Concordia (via Miltonvale on the Atchison, Topeka & Santa Fe Line.)

The New York Historical Society holds MS 111, (The Victor Remer Historical Archives of The Children’s Aid Society): which is so large that it’s hard to know where to begin. Portions have been digitized and placed on flickr, but these seem to be mainly photographs: individual case files are only allowed to be examined at the archives.

If there is some kind of guide to the Orphan Trains’ destinations (whether as a book or online), I haven’t yet found it. However, given that somewhere between 120,000 and 270,000 children (depending on which source you believe) were placed, it would perhaps be unsurprising if almost all destinations were covered at one time or another: and it would also be unsurprising if the placement or travel records that remain are far from complete.

Incidentally, the National Orphan Train Complex in Concordia is holding its 2017 Annual Orphan Train Riders Celebration from 1st to the 4th June 2017, if anyone not too far away is interested to find out more.

Orphan Trains and Maryland

Probably the most usefully skeptical resource is Orphan Train Myths and Legal Reality: the author (R.S.Trammell) argues that, though well-intentioned, in practice the Orphan Trains offered what was only a quick fix for what was a much deeper problem, and helped delay the kinds of deeper reforms and changes in attitude that were needed at the time.

Trammell also notes: “Orphan train trips were also sponsored and financed by charitable contributions and wealthy philanthropists such as Mrs. John Jacob Astor III who, by 1884, had sent 1,113 children west on the trains.” And also that New York wasn’t the only starting point: ” [s]imilar institutions were created in Baltimore, Maryland and Boston, Massachusetts”.

Trammell’s source for this last point was the 1902 book by Homer Folks: “The care of destitute, neglected, and delinquent children“. This talks (p.49) about the 1807 foundation of the Baltimore orphan asylum, which had originally been the “female orphaline charity school”, and then the Baltimore female orphan asylum managed by “nine discreet female characters”, and where “[t]he directors were also given power to bind out children placed in the school”. Folks also mentions “St. Mary’s female orphan asylum”, a Catholic asylum in Baltimore founded in 1817.

But can we find any records of these orphan asylums? Hmmm…

OK, so there’s like another Zodiac film coming out this summer (2017), and it’s like called Awakening The Zodiac. And if that’s not just like totally thrilling enough for you kerrrazy cipher people already, there’s also a trailer on YouTube long enough to eat a couple of mouthfuls of popcorn (maybe three tops):

I know, I know, some haters are gonna say that it’s disrepectful to the memory of the dead, given that the Zodiac claimed to have killed 37 people, and that the film makers are just building cruddy entertainment on top of their families’ suffering. But it’s just Hollllllllllywood, people, or rather about as Hollywood as you can get when you film it on the cheap in Canada. Though if the pitch was much more elaborate than “Storage Hunters meets serial killer”, you can like paint my face orange and call me Veronica.

Seriously, though, I’d be a little surprised if anyone who knows even 1% more than squat about ciphers was involved: if my eyes don’t deceive me, there certainly ain’t no “Oranchak” in the credits. Maybe there’ll turn out to be hidden depths here: but – like the Z340 – if there are, they’re very well hidden indeed.

This, you may be a little surprised to read, is a story about a “two-piece bustle dress of bronze silk with striped rust velvet accents and lace cuffs“, with original Ophelia-motif buttons. Maryland-based curator and antique dress collector Sara Rivers-Cofield bought it for a Benjamin from an antique mall around Christmas 2013: but it turned out – marvel of marvels – to have an odd-looking ciphertext concealed in a secret inside pocket.

In early 2014, German crypto-blogger Klaus Schmeh threw this puzzle to his readers to solve, not unlike a juicy bone to a pack of wolves. However, their voracious code-breaking teeth – normally reliable enough for enciphered postcards and the like – seemed not to gain any grip on this silk dress cipher, even when he revisited it a few days ago.

So… what is going on here? Why can’t we just shatter its cryptographic shell, like the brittle antique walnut it ought by all rights to be? And what might be the cipher’s secret history?

First, The Dress

It’s made of nice quality silk (and has been looked after well over its 130-odd year lifetime), so would have been a pricey item. The buttonholes are hand-stitched (and nicely finished), yet much of the other stitching was done by machine.

This alone would date the item to after 1850 or so (when Isaac Singer’s sewing machines began to be sold in any quantity). However, Sara Rivers-Cofield dates it (on purely stylistic grounds) to “the mid-1880s”, which I find particularly interesting, for reasons I’ll explain later.

All we know about its original owner, apart from a penchant for hidden ciphers, is her surname (“Bennett”) and her dress size. We might reasonably speculate (from the cost and quality of her silk two-piece) that she was somewhere between well-to-do and very well off; and perhaps from a larger city in Maryland (such as Baltimore) where silk would be more de rigueur; and possibly she wasn’t much beyond her mid-20s (because life expectancy wasn’t that good back then).

Who Might She Be?

It doesn’t take much web searching to come up with a plausible-sounding candidate: Margaret J. Bennett, “a dowager grand dame of Baltimore society” (according to the Baltimore Sun) who died childless in 1900, leaving $150,000 to endow a local trust to provide accommodation for homeless women.

Among Baltimore architectural historians, she is also remembered for the Bennett House at 17 West Mulberry Street: there, the land was purchased by F.W. Bennett (who was the head of his own Auction House in town), while the house was erected in 1880.

Anyway, if anyone here has access to American newspapers archives or Ancestry.com (though I have in the past, I don’t at the moment), I’d be very interested to know if they have anything on Margaret J. Bennett. I didn’t manage to find any family archives or photographs online, but hopefully you cunning people can do much better.

Of course, there may well be many other Mrs Bennetts who also match the same basic profile: but I think Margaret J. is too good a catch not to have at least a quick look. 🙂

Now, The Silk Dress Cipher Itself

What Sara Rivers-Cofield (and her mother) found hidden inside the silk dress’s secret inner pocket were two balled-up sheets of paper (she called them “The Bustle Code”):

Within a few seconds of looking at these, it was clear to me that what we have here is a genuine cipher mystery: that is, something where the cryptography and the history are so tangled that each obscures the other.

Curiously, the writing on the sheets is very structured: each line consists of between two and seven words, and all bar three of these have the number of words written in just below the first word. So even when text wraps round, it appears that we can treat that whole (wrapped) line as a single unit.

Also oddly, the writing is constrained well within the margins of the paper, to the point that there almost seems to be an invisible right-hand margin beyond which the writer did not (or could not) go. It therefore seems as though these sheets might be a copy of a document that was originally written on much narrower pieces of paper, but where the original formatting was retained.

Another point that’s worth making is that the idea of using word lists for telegraphy (and indeed cryptography) is to keep the words dissimilar to each other, to prevent messages getting scrambled. Yet here we appear to have words very similar to each other (such as “leafage” and “leakage”), along with words that seem to have been misheard or misspelt (“Rugina” for “Regina”, “Calgarry” for “Calgary”, etc).

To me, this suggests that part of the process involved somebody reading words out loud to someone writing them down. Hence I’ve attempted to correct parts of my transcription to try to bring some semblance of uniformity to it. (But feel free to disagree, I don’t mind).

Interestingly, if you lay out all the words in columns (having unwrapped the word wrapping), a number of striking patterns emerge…

The Column Patterns

Where the codetext’s words repeat, they do so in one of three groups: within the first column (e.g. “Calgarry”), within the second column (e.g. “Noun”), or within the remainder (e.g. “event”). In the following image, I’ve highlighted in different colours where words starting with the same letter repeat from column three onwards:

Moreover, the words in the first column are dominated by American and Canadian place names: although (just to be difficult) “egypt” and “malay” both appear elsewhere in the lines.

The third column is overwhelmingly dominated by l-words (legacy, loamy, etc): generally, words in the third to seventh columns start with a very limited range of letters, one quite unlike normal language initial letter distributions.

Indeed, this strongly suggests to me that the four instances of “Noun” in the second column are all nulls, because if you shift the remainder of those words across by one column, “laubul” / “leakage” / “loamy” / “legacy” all slide from column #4 back into the l-initial-heavy column #3.

It seems almost impossible at this point not to draw the conclusion that these words are drawn from lists of arbitrary words, arranged by first letter: and that without access to those same lists, we stand no real chance of making progress.

All the same, a commenter on Sara Rivers-Cofield’s blog (John McVey, who collects historical telegraph codes, and who famously – “famously” around here anyway – helped decode a 1948 Israeli telegram recently) proposed that what was in play might be not so much a telegraphic code as a telegraphic cipher.

These (though rare) included long lists of words to yield numerical equivalents, which could then be used to index into different lists (or sometimes the same list, but three words onwards). Here’s a link to an 1870 telegraphic cypher from McVey’s blog.

However, from the highly-structured nature of the word usage and repetitions here, I think we can rule out any kind of formal telegraphic code, i.e. this is not in any way a “flat” words-in-words-out code substitution.

Rather, I think that we are looking at something similar to the semi-improvised (yet complex) rum-runner codes that Elizebeth Friedman won acclaim for breaking in the 1920s and 1930s: strongly reliant on code lists, yet also highly specialized around the precise nature of the contents of the communication, and using amateur code-making cunning.

That is, the first two columns seem to be encoding a quite different type of content to the other columns: the l-list words seem to be signalling the start of the second half’s contents.

Were Other People Involved?

I’ve already suggested that the words on the two sheets were copied from smaller (or at least narrower) pieces of paper, and that as part of this someone may well have read words out for someone else to copy down (because spelling mistakes and/or mishearing mistakes seem to have crept in).

However, someone (very possibly a third person) has also apparently checked these, ticking each numbered line off with a rough green pencil. There are also underlinings under some words (such as “Lental”), not unlike a schoolteacher marking corrections on an exercise book.

Yet once you start to get secret writing with as many as three people involved, the chances of this being an individual’s private code would seem to sharply reduced – that is, I think we can rule out the possibility that this was the delusional product of a “lone gunman”. Moreover, there must surely have been a good-sized pie involved to warrant the effort of buying (or, perhaps more likely given the idiosyncratic nature of the words) assembling code books: by which I mean there was enough benefit to be divided into at least three slices and still be worth everyone’s while.

What I’m trying to get at here is that, from the number of people involved, the tangledness of the code books, and the curious rigid codetext structure, that this seems to have been an amateur code system constructed to enable some kind of organized behaviour.

Betting springs obviously to mind here: and possibly horse-racing, given that “dobbin” and “onager” appear in the codewords. But there’s another possibility…

Numbers and policies?

With its Puritan historical backdrop, America has long had an ambivalent attitude towards both gambling and alcohol: the history of casinos, inter-state gambling, and even Prohibition all attest strongly to this.

By the 1880s, the kind of state or local lotteries that had flourished at the start of that same century had almost all been shut down, victims of corruption and scandals. The one that remained (the Louisiana Lottery) was arguably even more corrupt than the others, but remained afloat thanks to the number of politicians benefiting from it: in modern political argot, it was (for a while, at least) “too big to fail”.

What stepped into the place of the state lotteries were illegal local lotteries, better known as the “numbers game”, or the numbers racket. Initially, these were unofficial lotteries run from private residences: but later (after the 1920s, I believe), they began to instead use numbers printed in newspapers that were believed to be random (such as the last three digits of various economic indicators, such as the total amount of money taken at a given racetrack), because of – surprise, surprise – the same kinds of corruption and rigging that had plagued the early official state lotteries.

Though the numbers racket became known as the scourge of Harlem in the first half of the twentieth century (there’s a very good book on this, “Playing the Numbers: Gambling in Harlem between the Wars”), modern state lotteries and interstate sports betting all but killed it off, though a few numbers joints do still exist (“You’re too late to play!“).

Back in the second half of the 19th century, ‘policy shops’ (where the question “do you want to buy a policy?” drew a parallel between insurance and gambling) started to flourish, eventually becoming a central feature of the American urban landscape. With more and more state lotteries being shut down as the century progressed, numbers were arguably the face of small-stake betting: in terms of accessibility, they were the equivalent of scratch cards, available nearly everywhere.

For a long time, though, information was king: if you were organized enough to get access to the numbers before the policy shop did, you could (theoretically) beat the odds. Winning numbers were even smuggled out by carrier pigeon: yet policy shops (who liked to take bets right up until the last possible moment) were suspicious of “pigeon numbers”, and would often not pay out if they caught so much as a sniff of subterfuge. It’s not as if you could complain to the police, right?

At the same time, a whole hoodoo culture grew up around numbers, where superstitious players were sold incense sticks, bath crystals, and books linking elements in your dreams to numbers. First published in 1889, one well-known one was “Aunt Sally’s Policy Player’s Dream Book”:

This contained lists linking dream-items to suggestions of matching number sequences to back, with two numbers being a “saddle”, three numbers a “gig”, and four numbers a “horse”: on the book’s cover, Aunt Sally is shown holding up “the washerwoman’s gig” (i.e. 4.11.44). There’s much more about this on Cat Yronwode’s excellent Aunt Sally page.

Might it be that these two Silk Dress Cipher sheets are somehow numbers betting slips that have been encoded? Could it be that each line somehow encodes a name (say, the first two columns), the size of the bet, and a set of numbers to bet on? There were certainly illegal lotteries and policy shops in Baltimore, so this is far from impossible.

Right now, I don’t know: but I’d be very interested to know of any books that cover the history of “policy shops” in the 19th century. Perhaps the clues will turn out to be somewhere under The Baltimore Sun…

As I see it, there are four foundational tasks that need to be done to wrangle Voynichese into a properly usable form:

* Task #1: Transcribing Voynichese text into a reliable computer-readable raw transcription e.g. EVA qokeedy
* Task #2: Parsing the raw transcription to determine Voynichese’s fundamental units (its tokens) e.g. [qo][k][ee][dy]
* Task #3: Clustering the pages / folios into groups where the text shares distinct features e.g. Currier A vs Currier B
* Task #4: Normalizing the clusters e.g. how A tokens / patterns map to B tokens / patterns, etc

I plan to tackle these four areas in separate posts, to try to build up a substantive conversation on each topic in turn.

Takahashi’s EVA transcription

Rene Zandbergen points out that, of all the different “EVA” transcriptions that appear interleaved in the EVA interlinear file, “the only one that was really done in EVA was the one from Takeshi. He did not use the fully extended EVA, which was probably not yet available at that time. All other transcriptions have been translated from Currier, FSG etc to EVA.

This is very true, and is the main reason why Takeshi Takahashi’s transcription is the one most researchers tend to use. Yet aside from not using extended EVA, there are a fair few idiosyncratic things Takeshi did that reduce its reliability, e.g. as Torsten Timm points outTakahashi reads sometimes ikh where other transcriptions read ckh“.

So the first thing to note is that the EVA interlinear transcription file’s interlinearity arguably doesn’t actually help us much at all. In fact, until such time as multiple genuinely EVA transcriptions get put in there, its interlinearity is more of an archaeological historical burden than something that gives researchers any kind of noticeable statistical gain.

What this suggests to me is that, given the high quality of the scans we now have, we really should be able to collectively determine a single ‘omega’ stroke transcription: and even where any ambiguity remains (see below), we really ought to be able to capture that ambiguity within the EVA 2.0 transcriptions itself.

EVA, Voyn-101, and NEVA

The Voyn-101 transcription used a glyph-based Voynichese transcription alphabet derived by the late Glen Claston, who invested an enormous amount of his time to produce a far more all-encompassing transcription style than EVA did. GC was convinced that many (apparently incidental) differences in the ways letter shapes were put on the page might encipher different meanings or tokens in the plaintext, and so ought to be captured in a transcription.

So in many ways we already have a better transcription, even if it is one very much tied to the glyph-based frame of reference that GC was convinced Voynichese used (he firmly believed in Leonell Strong’s attempted decryption).

Yet some aspects of Voynichese writing slipped through the holes in GC’s otherwise finely-meshed net, e.g. the scribal flourishes on word-final EVA n shapes, a feature that I flagged in Curse back in 2006. And I would be unsurprised if the same were to hold true for word-final -ir shapes.

All the same, GC’s work on v101 could very well be a better starting point for EVA 2.0 than Takeshi’s EVA. Philip Neal writes: “if people are interested in collaborating on a next generation transcription scheme, I think v101/NEVA could fairly easily be transformed into a fully stroke-based transcription which could serve as the starting point.

EVA, spaces, and spatiality

For Philip Neal, one key aspect of Voynichese that EVA neglects is measurements of “the space above and below the characters – text above, blank space above etc.

To which Rene adds that “for every character (or stroke) its coordinates need to be recorded separately”, for the reason that “we have a lot of data to do ‘language’ statistics, but no numerical data to do ‘hand’ statistics. This would, however, be solved by […having] the locations of all symbols recorded plus, of course their sizes. Where possible also slant angles.

The issue of what constitutes a space (EVA .) or a half-space (EVA ,) has also not been properly defined. To get around this, Rene suggests that we should physically measure all spaces in our transcription and then use a software filter to transform that (perhaps relative to the size of the glyphs around it) into a space (or indeed half-space) as we think fit.

To which I’d point out that there are also many places where spaces and/or half-spaces seem suspect for other reasons. For example, it would not surprise me if spaces around many free-standing ‘or’ groups (such as the famous “space transposition” sequence “or or oro r”) are not actually spaces at all. So it could well be that there would be context-dependent space-recognition algorithms / filters that we might very well want to use.

Though this at first sounds like a great deal of work to be contemplating, Rene is undaunted. To make it work, he thinks that “[a] number of basics should be agreed, including the use of a consistent ‘coordinate system’. Again, there is a solution by Jason Davies [i.e. voynichese.com], but I think that it should be based on the latest series of scans at the Beinecke (they are much flatter). My proposal would be to base it on the pixel coordinates.

For me, even though a lot of this would be nice things to have (and I will be very interested to see Philip’s analysis of tall gallows, long-tailed characters and space between lines), the #1 frustration about EVA is still the inconsistencies and problems of the raw transcription itself.

Though it would be good to find a way of redesigning EVA 2.0 to take these into account, perhaps it would be better to find a way to stage delivery of these features (hopefully via OCR!), just so we don’t end up designing something so complicated that it never actually gets done. 🙁

EVA and Neal Keys

One interesting (if arguably somewhat disconcerting) feature of Voynichese was pointed out by Philip Neal some years ago. He noted that where Voynichese words end in a gallows character, they almost always appear on the top line of a page (sometimes the top line of a paragraph). Moreover, these had a strong preference for being single-leg gallows (EVA p and EVA f); and also for appearing in nearby pairs with a short, often anomalous-looking stretch of text between them. And they also tend to occur about 2/3rds of the way across the line in which they fall.

Rather than call these “top-line-preferring-single-leg-gallows-preferring-2/3rd-along-the-top-line-preferring-anomalous-text-fragments“, I called these “Neal Keys”. This term is something which other researchers (particularly linguists) ever since have taken objection with, because it superficially sounds as though it is presupposing that this is a cryptographic mechanism. From my point of view, those same researchers didn’t object too loudly when cryptologist Prescott Currier called his Voynichese text clusters “languages”: so perhaps on balance we’re even, OK?

I only mention this because I think that EVA 2.0 ought to include a way of flagging likely Neal Keys, so that researchers can filter them in or out when they carry out their analyses.

EVA and ambiguity

As I discussed previously, one problem with EVA is that it doesn’t admit to any uncertainty: by which I mean that once a Voynichese word has been transcribed into EVA, it is (almost always) then assumed to be 100% correct by all the people and programmes that subsequently read it. Yet we now have good enough scans to be able to tell that this is simply not true, insofar as there are a good number of words that do not conform to EVA’s model for Voynichese text, and for which just about any transcription attempt will probably be unsatisfactory.

For example, the word at the start of the fourth line on f2r:

Here, the first part could possibly be “sh” or “sho”, while the second part could possibly be “aiidy” or “aiily”: in both cases, however, any transcriber attempting to reduce it to EVA would be far from certain.

Currently, the most honest way to transcribe this in EVA would be “sh*,aii*y” (where ‘*’ indicates “don’t know / illegible”). But this is an option that isn’t taken as often as it should.

I suspect that in cases like this, EVA should be extended to try to capture the uncertainty. One possible way would be to include a percentage value that an alternate reading is correct. In this example, the EVA transcription could be “sh!{40%=o},aiid{40%=*}y”, where “!{40%=o}” would mean “the most likely reading is that there is no character there (i.e. ‘!’), but there is a 40% chance that the character should be ‘o'”.

For those cases where two or more EVA characters are involved (e.g. where there is ambiguity between EVA ch and EVA ee), the EVA string would instead look like “ee{30%=ch}”. And on those occasions where there is a choice between a single letter and a letter pair, this could be transcribed as “!e{30%=ch}”.

For me, the point about transcribing with ambiguity is that it allows people doing modelling experiments to filter out words that are ambiguous (i.e. by including a [discard words containing any ambiguous glyphs] check box). Whatever’s going on in those words, it would almost always be better to ignore them rather than to include them.

EVA and Metadata

Rene points out that the metadata “were added to the interlinear file, but this is indeed independent from EVA. It is part of the file format, and could equally be used in files using Currier, v101 etc.” So we shouldn’t confuse the usefulness of EVA with its metadata.

In many ways, though, what we would really like to have in the EVA metadata is some really definitive clustering information: though the pages currently have A and B, there are (without any real doubt) numerous more finely-grained clusters, that have yet to be determined in a completely rigorous and transparent (open-sourced) way. However, that is Task #3, which I hope to return to shortly.

In some ways, the kind of useful clustering I’m describing here is a kind of high-level “final transcription” feature, i.e. of how the transcription might well look much further down the line. So perhaps any talk of transcription

How to deliver EVA 2.0?

Rene Zandbergen is in no doubt that EVA 2.0 should not be in an interlinear file, but in a shared online database. There is indeed a lot to be said for having a cloud database containing a definitive transcription that we all share, extend, mutually review, and write programmes to access (say, via RESTful commands).

It would be particularly good if the accessors to it included a large number of basic filtering options: by page, folio, quire, recto/verso, Currier language, [not] first words, [not] last words, [not] first lines, [not] labels, [not] key-like texts, [not] Neal Keys, regexps, and so forth – a bit like voynichese.com on steroids. 🙂

It would also be sensible if this included open-source (and peer-reviewed) code for calculating statistics – raw instance counts, post-parse statistics, per-section percentages, 1st and 2nd order entropy calculations, etc.

Many of these I built into my JavaScript Voynichese state machine from 2003: there, I wrote a simple script to convert the interlinear file into JavaScript (developers now would typically use JSON or I-JSON).

However, this brings into play the questions of boundaries (how far should this database go?), collaboration (who should make this database), methodology (what language or platform should it use?), and also of resources (who should pay for it?).

One of the strongest reasons for EVA’s success was its simplicity: and given the long (and complex) shopping list we appear to have, it’s very hard to see how EVA 2.0 will be able to compete with that. But perhaps we collectively have no choice now.

In the Voynich research world, several transcriptions of the Voynich Manuscript’s baffling text have been made. Arguably the most influential of these is EVA: this originally stood for “European Voynich Alphabet”, but was later de-Europeanized into “Extensible Voynich Alphabet”.

The Good Things About EVA

EVA has two key aspects that make it particularly well-adapted to Voynich research. Firstly, the vast majority of Voynichese words transcribed into EVA are pronouncable (e.g. daiin, qochedy, chodain, etc): this makes them easy to remember and to work with. Secondly, it is a stroke-based transcription: even though there are countless ways in which the inidvidual strokes could possibly be joined together into glyphs (e.g. ch, ee, ii, iin) or parsed into possible tokens (e.g. qo, ol, dy), EVA does not try to make that distinction – it is “parse-neutral”.

Thanks to these two aspects, EVA has become the central means by which Voynich researchers trying to understand its textual mysteries converse. In those terms, it is a hugely successful design.

The Not-So-Good Things About EVA

In retrospect, some features of EVA’s design are quite clunky:
* Using ‘s’ to code both for the freestanding ‘s’-shaped glyph and for the left-hand half of ‘sh’
* Having two ways of coding ligatures (either with round brackets or with upper-case letters)
* Having so many extended characters, many of which are for shapes that appear exactly once

There are other EVA design limitations that prevent various types of stroke from being captured:
* Having only limited ways of encoding the various ‘sh’ “plumes” (this particularly annoyed Glen Claston)
* Having no way of encoding the various ‘s’ flourishes (this also annoyed Glen)
* Having no way of encoding various different ‘-v’ flourishes (this continues to annoy me)

You also run into various annoying inconsistences when you try to use the interlinear transcription:
* Some transcribers use extended characters for weirdoes, while others use no extended characters at all
* Directional tags such as R (radial) and C (circular) aren’t always used consistently
* Currier language (A / B) isn’t recorded for all pages
* Not all transcribers use the ‘,’ (half-space) character
* What one transcriber considers a space or half-space, another leaves out completely

These issues have led some researchers to either make their own transcriptions (such as Glen Claston’s v101 transcription), or to propose modifications to EVA (such as Philip Neal’s little-known ‘NEVA’, which is a kind of hybrid, diacriticalised EVA, mapped backwards from Glen Claston’s transcription).

However, there are arguably even bigger problems to contend with.

The Problem With EVA

The first big problem with EVA is that in lots of cases, Voynichese just doesn’t want to play ball with EVA’s nice neat transcription model. If we look at the following word (it’s right at the start of the fourth line on f2r), you should immediately see the problem:

The various EVA transcribers tried gamely to encode this (they tried “chaindy”, “*aiidy”, and “shaiidy”), but the only thing you can be certain of is that they’re probably all wrong. Because of the number of difficult cases such as this, EVA should perhaps have included a mechanism to let you flag an entire word as unreliable, so that people trying to draw inferences from EVA could filter it out before it messes up their stats.

(There’s a good chance that this particular word was miscopied or emended: you’d need to do a proper codicological analysis to figure out what was going on here, which is a complex and difficult activity that’s not high up on anyone’s list of things to do.)

The second big problem with EVA is that of low quality. This is (I believe) because almost all of the EVA transcriptions were done from the Beinecke’s ancient (read: horrible, nasty, monochrome) CopyFlo printouts, i.e. long before the Beinecke released even the first digital image scan of the Voynich Manuscript’s pages. Though many CopyFlo pages are nice and clean, there are still plenty of places where you can’t easily tell ‘o’ from ‘a’, ‘o’ from ‘y’, ‘ee’ from ‘ch’, ‘r’ from ‘s’, ‘q’ from ‘l’, or even ‘ch’ from ‘sh’.

And so there are often wide discrepancies between the various transcriptions. For example, looking at the second line of page f24r:

…this was transcribed as:


qotaiin.char.odai!n.okaiikhal.oky-{plant} --[Takahashi]
qotaiin.eear.odaiin.okai*!!al.oky-{plant} --[Currier, updated by Voynich mailing list members]
qotaiin.char.odai!n.okaickhal.oky-{plant} --[First Study Group]

In this specific instance, the Currier transcription is clearly the least accurate of the three: and even though the First Study Group transcription seems closer than Takeshi Takahashi’s transcription here, the latter is frequently more reliable elsewhere.

The third big problem with EVA is that Voynich researchers (typically newer ones) often treat it as if it is final (it isn’t); or as if it is a perfect representation of Voynichese (it isn’t).

The EVA transcription is often unable to reflect what is on the page, and even though the transcribers have done their best to map between the two as best they can, in many instances there is no answer that is definitively correct.

The fourth big problem with EVA is that it is in need of an overhaul, because there is a huge appetite for running statistical experiments on a transcription, and the way it has ended up is often not a good fit for that.

It might be better now to produce not an interlinear EVA transcription (i.e. with different people’s transcriptions interleaved), but a single collective transcription BUT where words or letters that don’t quite fit the EVA paradigm are also tagged as ambiguous (e.g. places where the glyph has ended up in limbo halfway betwen ‘a’ and ‘o’).

What Is The Point Of EVA?

It seems to me that the biggest problem of all is this: that almost everyone has forgotten that the whole point of EVA wasn’t to close down discussion about transcription, but rather to enable people to work collaboratively even though just about every Voynich researcher has a different idea about how the individual shapes should be grouped and interpreted.

Somewhere along the line, people have stopped caring about the unresolved issue of how to parse Voynichese (e.g. to determine whether ‘ee’ is one letter or two), and just got on with doing experiments using EVA but without understanding its limitations and/or scope.

EVA was socially constructive, in that it allowed people with wildly different opinions about how Voynichese works to discuss things with each other in a shared language. However, it also inadvertantly helped promote an inclusive accommodation whereby people stopped thinking about trying to resolve difficult issues (such as working out the correct way to parse the transcription).

But until we can start find out a way to resolve such utterly foundational issues, experiments on the EVA transcription will continue to give misleading and confounded results. The big paradox is therefore that while the EVA transcription has helped people discuss Voynichese, it hasn’t yet managed to help people advance knowledge about how Voynichese actually works beyond a very superficial level. *sigh*

For far too long, Voynich researchers have (in my opinion) tried to use statistical analysis as a thousand-ton wrecking ball, i.e. to knock down the whole Voynich edifice in a single giant swing. Find the perfect statistical experiment, runs the train of thought, and all Voynichese’s skittles will clatter down. Strrrrike!

But… even a tiny amount of reflection should be enough to show that this isn’t going to work: the intricacies and contingencies of Voynichese shout out loud that there will be no single key to unlock this door. Right now, the tests that get run give results that are – at best – like peering through multiple layers of net curtains. We do see vague silhouettes, but nothing genuinely useful appears.

Whether you think Voynichese is a language, a cipher system, or even a generated text doesn’t really matter. We all face the same initial problem: how to make Voynichese tractable, by which I mean how to flatten it (i.e. regularize it) to the point where the kind of tests people run do stand a good chance of returning results that are genuinely revealing.

A staging point model

How instead, then, should we approach Voynichese?

The answer is perhaps embarrassingly obvious and straightforward: we should collectively design and implement statistical experiments that help us move towards a series of staging posts.

Each of the models on the right (parsing model, clustering model, and inter-cluster maps) should be driven by clear-headed statistical analysis, and would help us iterate towards the staging points on the left (parsed transcription, clustered parsed transcription, final transcription).

What I’m specifically asserting here is that researchers who perform statistical experiments on the raw stroke transcription in the mistaken belief that this is as good as a final transcription are simply wasting their time: there are too many confounding curtains in the way to ever see clearly.

The Curse, statistically

A decade ago, I first talked about “The Curse of the Voynich”: my book’s title was a way of expressing the idea that there was something about the way the Voynich Manuscript was constructed that makes fools of people who try to solve it.

Interestingly, it might well be that the diagram above explains what the Curse actually is: that all the while people treat the raw (unparsed, unclustered, unnormalized) transcription as if it were the final (parsed, clustered, normalized) transcription, their statistical experiments will continue to be confounded in multiple ways, and will show them nothing useful.

I’ve just had a particularly interesting email exchange with Paul Relkin concerning the Feynman Challenge Ciphers, which he has generously allowed me to share here. The context is that the first Feynman Challenge cipher’s plaintext was from the very start of Geoffrey Chaucer’s Canterbury Tales, i.e. the first twelve lines of the General Prologue:

WHAN THAT APRILLE WITH HIS SHOURES SOOTE
THE DROGHTE OF MARCH HATH PERCED TO THE ROOTE
AND BATHED EVERY VEYNE IN SWICH LICOUR
OF WHICH VERTU ENGENDRED IS THE FLOUR
WHAN ZEPHIRUS EEK WITH HIS SWEETE BREFTH
INSPIRED HATH IN EVERY HOLT AND HEETH
THE TENDRE CROPPES AND THE YONGE SONNE
HATH IN THE RAM HIS HALVE COURS Y-RONNE
AND SMALE FOWELES MAKEN MELODYE
THAT SLEPEN AL THE NYGHT WITH OPEN YE
SO PRIKETH HEM NATURE IN HIR CORAGES
THANNE LONGEN FOLK TO GO ON ON PILGRIM[AGES]

Paul writes:


The Prologue

I’d like to share with you a possible clue I’ve discovered to the sources of the 2nd and 3rd Feynman Ciphers. My findings relate to the identification of a specific published transcription of the Canterbury Tales that is the probable source of the 1st Feynman Cipher.

As you are probably aware, the Canterbury Tales have been transcribed and reprinted innumerable times. Among the many different published editions of the Canterbury Tales, there are several idiosyncratic spellings associated with particular transcriptions. Although individual lines are spelled the same way in many different editions, I found that the 12 lines of the Feynman Cipher taken together are unique enough to match only one published transcription, like a “word fingerprint”.

To find the edition that the Feynman Cipher is based on, I extensively searched for editions of the General Prologue that were published before or during World War II and compared the word spellings to the Feynman Cipher.

First, I discovered what may be a typo in the 1st Feynman Cipher. The word “brefth” does not appear in any published edition of the General Prologue I have been able to identify. The most likely correct spelling is “breeth”.

Second, I found that the only version of the General Prologue that matches the Feynman Cipher is Fred Norris Robinson’s 1st edition of Chaucer’s Complete Works. In the introduction to his book, Robinson actually discusses several of the uniquely spelled words that later found their way into the 1st Feynman Cipher and explains why he rejected the popular spellings and chose less common ones.

Possible Sources

Having identified Robinson’s transcription as the probable source of the 1st Feynman Cipher, I discovered that there are only a few different editions of this transcription that were published between 1933 and 1938 that could have been used by the author of the Feynman Ciphers:

In 1933, Houghton Mifflin published this book in at least three editions:

The Complete Works of Geoffrey Chaucer (black):

The Complete Works of Geoffrey Chaucer, Student’s Cambridge Edition (red):

The Poetical Works of Chaucer, Cambridge Edition (white):

In 1936, Houghton Mifflin published small books containing parts of Robinson’s Canterbury Tales with an introduction written by Max John Herzberg. The title of the book that contains the quote used in the cipher is “The Prologue, the Knight’s Tale, and the Nun’s Priest’s Tale”:

In 1938, Houghton Mifflin included Robinson’s Canterbury Tales in a two volume collection of British poetry by Paul Robert Lieder called “British Poetry and Prose” (Volume 1):

Interestingly, Robinson’s 2nd edition of Chaucer’s Complete Works in 1957 no longer matches the spellings in the cipher!

It’s specifically here where I think we may find clues to the 2nd and 3rd ciphers. It seems plausible to me that “British Poetry and Prose” contains other literary works that were the basis for the 2nd and 3rd Feynman Ciphers. For example, several of its poems have 6 letter words that repeat twice, consistent with “CJUMVRCJUMVR” in the 2nd Feynman Cipher.

Robinson’s 1933 book of Chaucer’s Complete Works could also be the source of the 2nd and 3rd ciphers. The 1933 book is part of a series of books called “The Cambridge Poets” and the 1936 book is part of a series called “The Riverside Literature Series”. The other books in the series are also potentially worth looking at.

Los Alamos?

My research suggests that several copies of these books have the original owner’s name and other notes written in them. If we were able to locate the copy that was used at Los Alamos, it might reveal the name of the scientist who created the ciphers. There may be other writings within it that would give further clues about the ciphers.

I discovered that the Mesa Public Library in Los Alamos has a copy of Robinson’s 1933 book. The Mesa Public Library originated during World War II in the Big House where Feynman lived, so I wondered whether the library book could be the copy that was used to create the cipher.

So, I recently arranged to borrow that book through interlibrary loan. Since I live on the East Coast, I had to try 5 different libraries before I found one that would let me request that particular book. It then took two tries because they accidentally requested the book from the Mesa Public Library in Arizona instead of the one in New Mexico. I finally received the book I requested. Unfortunately, the book plate indicates that it was donated to the library in the 1970s. This makes it unlikely (albeit not impossible) that this was the specific copy used in the period around World War II to create the 1st Feynman Cipher.

I hope you find this information interesting and that it brings us a step closer to solving the 2nd and 3rd Feynman Ciphers.

Chaucer and Cryptography?

(((NickP: I responded here, pointing out:)))

Incidentally, there are two interesting links between Geoffrey Chaucer and cryptography. The first (which you may well have heard of) is that he included six blocks of ciphertext in his Treatise on the “Equatorie” (basically a kind of astrolabe). But the second is that a very major work on Chaucer (finally published in 1940) was written by John Matthews Manly and Edith Rickert, both well-known code-breakers. (I’ve covered them a few times on CM, mainly because of Manly’s links to the Voynich Manuscript.)

However, Rickert died in 1938, Manly died in 1940 and Los Alamos only really started in 1943, so we can rule out a direct transmission from either of them to Feynman. All the same, I do consider it entirely possible that one/both of them was/were the ultimate source of the three cryptograms. Just so you know!

(((To which Paul replied:)))

Concerning your excellent point about Rickert and Manly, there was another colorful link between a Chaucer scholar and Los Alamos that I found while I was researching editions of the Canterbury Tales. John Strong Perry Tatlock was a famous Chaucer expert who transcribed Chaucer’s Complete Works. His daughter, Jean Frances Tatlock, had a romantic relationship with J. Robert Oppenheimer between 1936 and 1939. They continued to have an affair during Oppenheimer’s marriage. Their relationship was used as evidence against Oppenheimer during his security clearance hearings because Tatlock was a member of the Communist Party. As you know, Oppenheimer and Feynman had more than a passing acquaintance – as for Tatlock and Feynman, who knows?

Just a short note to say that I’ve today decided to stop selling physical copies of “The Curse of the Voynich”. I first published it at the end of 2006 (the front page says “v1.0: Emma Vine (Broceto)“, if you want to try decrypting that), and it’s now time for me to leave it to the book collectors and move on. 🙂

Thanks very much to everyone reading this who bought a copy along the way – this helped recoup me some of the money I lost during the six months I worked part-time while I did the research for it. And for those who bought their copy direct from Compelling Press, I really hope you enjoyed your anagrammatic dedication – finding nice anagrams of people’s names was always something I enjoyed doing.

Incidentally, second-hand copies of “Curse” are on sale through bookfinder.com, though at prices ranging from £47 to £2500 (!): I expect the lowest price will rise to around £200 before very long, so anyone here who already has a copy is arguably now a little bit better off. Which is nice (if you’re an accountant). 🙂

Finally: for anyone who would like a copy of “Curse” in the future, please note that I plan to make an ebook version available before long (hopefully later this year). I’ll do my best, but don’t hold your breath waiting for it in the ultra-short term, because publication rights for pictures and quotations always take longer to clear than you’d like. *sigh*.

What I have long tried to do with this blog is to genuinely advance our collective knowledge about unbroken historical ciphers, not by speculating loosely or wildly (as seems to be the norm these days) but instead by trying to reason under conditions of uncertainty. That is: I try to use each post as an opportunity to think logically about multiple types of historical evidence that often coincide or overlap yet are individually hard to work with – ciphers, cryptograms, drawings, treasure maps, stories, legends, claims, beliefs, mysteries.

The world of cipher mysteries, then, is a world both of uncertain evidence and also of uncertain history built on top of that uncertain evidence – perpetually thin ice to be skating on, to be sure.

A skills void?

It is entirely true that all historical evidence is inherently uncertain: people lie, groups have agendas, listeners misunderstand, language misleads, copyists misread, propagandists appropriate, historians overselect, forgers fake, etc. All the same, seeing past/through the textual uncertainties these kinds of behaviours can leave embedded in evidence is the bread and butter of modern historians, who are now trained to be adept both in close reading and critical thinking.

However, what I am arguing here is that though History-as-text – i.e. history viewed as primarily an exercise in textual literature analysis – managed to win the historical high ground, it did so at the cost of supplanting almost all non-textual historical disciplines. To my eyes, the slow grinding deaths of codicology, palaeography and even dear old iconography (now more visible in Dan Brown film adaptations than in bibliographies) along with what I think is the increasing marginalization of Art History far from the historical mainstream have collectively left a huge gap at the heart of the subject.

This isn’t merely a focus void, it’s also centrally a skills void – the main missing skill being the ability to reason under conditions where the evidence’s textual dimension is missing or sharply limited.

In short, I would argue that because historians are now trained to deal primarily with textual uncertainties, the ability to reason effectively with other less compliant types of evidence is a skill few now seem to have to any significant degree. In my opinion, this aspect of text-centrism is a key structural weakness of history as now taught.

In my experience, almost nothing exposes this weakness more than the writing done on the subject of historical cipher mysteries. There it is absolutely the norm to see otherwise clever people make fools of themselves, and moreover in thousands of different ways: surely in few other subject domains has so much ink have been spilled to so little effect. In Rene Zandbergen’s opinion, probably the most difficult thing about Voynich research is avoiding big mistakes: sadly, few seem able to achieve this.

“The Journal of Uncertain History”

Yet a key problem I face is that when it comes to presenting or publishing, the kind of fascinating historical mysteries I research are plainly a bad fit for the current academic landscape. This is because what I’m trying to develop and exercise there is a kind of multi-disciplinary / cross-disciplinary analytical historical skill (specifically: historical reasoning under uncertainty) that has quite different aims and success criteria from mainstream historical reasoning.

On the one hand, this “Uncertain History” is very much like Intellectual History, in that it is a meta-historical approach that freely crosses domain boundaries while relying heavily on the careful application of logic in order to make progress. And yet I would argue that Intellectual History as currently practised is heavily reliant on the universality of text and classical logic to build its chains of reasoning. In that sense, Intellectual History is a close cousin to the text-walled world of MBA courses, where all statements in case studies are deemed to be both true and given in good faith.

By way of contrast, Uncertain History turns its face primarily to those historical conundrums and mysteries where text falls short, where good faith can very often be lacking, and where strict Aristotelian logic can prove more of a hindrance than a help (here I’m thinking specifically about the Law of the Excluded Middle).

And so I propose launching a new open-source historical journal (Creative Commons BY-NC Licence), with the provisional name of “The Journal of Uncertain History“, and with the aim of providing a home for Uncertain History research of all types.

To be considered for the JoUH, papers should (also provisionally) be tackling research areas where:

* the historical evidence itself is problematic and/or uncertain;
* there is a problematic interplay between the types of evidence;
* to make genuine progress, non-trivial reasoning is required, not just for thinking but also for explanation;
* historical speculations made within the paper are both proposed and tested; and
* future tests (preferably empirical) and/or research leads are proposed.

I welcome all your comments, thoughts, and suggestions for possible submissions, authors, collaborators and/or editors; and especially reasons why existing journals X, Y and Z would all be better homes for this kind of research than the JoUH. 🙂