A paper came out a few days ago on arXiv.org, called “Probing the statistical properties of unknown texts: application to the Voynich Manuscript” written by three Brazilian academics (with assist from two German academics).

The authors grouped Voynichese (i.e. Voynich text) hypotheses into three broad categories:

“(i) A sequence of words without a meaningful message;
(ii) a meaningful text written originally in an existing language which was coded (and possibly encrypted) in the Voynich alphabet; and
(iii) a meaningful text written in an unknown (possibly constructed) language.”

After developing a whole load of word-occurrence-based statistical machinery (defining “intermittency”, etc) and applying them both to real text corpora and to Voynichese, they conclude that the word structure of Voynichese is incompatible with shuffled texts (which is how they model (i)-class hypotheses), and “mostly compatible with natural languages” (the (ii)- and (iii)-class hypotheses). They end up by using their statistical machinery to suggest Voynichese “keywords” – words that, according to their statistical measures, stand out from the text.

Their suggested English keywords (generated from the New Testament) are:-
* begat Pilates talents loaves Herod tares vineyard shall boat demons ve pay sabbath hear whosoever

Their suggested Voynichese keywords (generated from an EVA transcription, though they don’t say which, so possibly Takahashi’s?):-
* cthy qokeedy shedy qokain chor lkaiin qol lchedy sho qokaiin olkeedy qokal qotain dchor otedy

OK, but… what do I think? First off, I’m pleased to see that their results seem incompatible with “shuffled texts” or randomized texts, because that is what nearly all of the various Voynich “hoax” hypotheses rely on. Intuitively, just about anyone who has worked with Voynichese for any period of time is struck by its intense internal structuring on many levels: so it is nice to see the same result coming out from a different angle.

Secondly, what they mean by “mostly compatible” is that while Voynichese passes many of their proposed tests comfortably, it actually fails some of them (and only passes others by the slimmest of whiskers). To me, that implies either (a) an exotically- (and non-obviously-)structured language or constructed lanaguge, or (b) an obfuscated language (e.g. a ciphertext or shorthand): conversely, it seems to imply that Voynichese isn’t a one-to-one-map of any mainstream language (which is what cryptographers such as Elizebeth Friedman have been saying for years). Yet the earliest constructed language we currently know of was devised at least a century after the Voynich’s vellum dating (and about a century after its earliest marginalia), so we can almost certainly rule that possibility out.

I don’t know: while it’s always good to see people approaching the Voynich Manuscript from a new angle, I can’t help but feel that in just about every instance the Voynich’s author remains at least three or four steps ahead of them. The key paradox of Voynichese revolves around the fact that even though it so resembles a natural language, the way its words work as semantic units fails to do so in quite the same way. So for me, the important thing here is to try to understand the tests that failed, and see what they tell us about how Voynich words don’t work… but that will doubtless take a little time.

As for the suggested keywords: personally, I’d be rather more convinced by their statistical machinery if it had automagically suggested the word “Jesus” rather than “boat” or “vineyard” for the New Testament, so I have to say I’m far from persuaded that their list of Voynich cribs will help us unlock its secrets at all… but you never know, so perhaps let’s give them the benefit of the doubt on this one! 😉

Just the merest hint of a nudge to your collective set of virtual elbows, to remind you that the first Voynich London pub meet for basically ages is this evening (7th March 2013), at The Prospect of Whitby in Wapping. Though having said that, all cipher mysteries are fair game, not just the Voynich Manuscript: hence cipher pigeon fanciers and armchair treasure hunters are more than welcome to come along too. Plenty of room for everyone!

I’ll be there from 6.15pm or so, hoping to catch up on the latest Euro cipher gossip from Gotha and elsewhere, courtesy of Herr Cipher Skeptic himself, Klaus Schmeh, who’s on a flying visit to London having had a swift peek at the various enciphered books in the British Library (“The Subtlety of Witches”, etc). So if you can make your way to Wapping Wall for even half an hour, it would be really great to see you.

[Even stronger nudge: Tony Gaffney, what on earth do I have to do to persuade you to come along? I haven’t seen you in 25 years or so!]

Just so you know: if it’s a nice evening (or if someone happens to bring their dog along with them, John 🙂 ), the chances are we’ll be located in the terraced area through the pub to the back left (looking out over the Thames). Otherwise, we could be anywhere on the pub’s two floors, depending on how busy it happens to be. Looking forward to it!

Moshe Rubin just emailed me to let me know that his extensive October 2011 Cryptologia article “John F. Byrne’s Chaocipher Revealed: An Historical and Technical Appraisal” (vol. 35 issue 4, pp.328-379 [!!!]) can currently be viewed and downloaded for free from Taylor & Francis (who publish Cryptologia), via the “Download full text” button there.

If (like me) you’re into both the social and technical aspects of historical cryptography, it’s a cracking old read, covering both Byrne’s life and his numerous attempts to get the US military to accept his “Chaocipher” invention. Yet Moshe’s article is far from all ra-ra-pro-Byrne stuff: it also makes clear…
* the system’s inherent fragility (because each step changed the state of the two rotors, it suffered from near-worst-case error propagation);
* Byrne’s cryptographic inexperience (the way that he proposed concealing the indicator settings was far from secure); and
* Byrne’s cryptologic naivety (he believed that the flat letter distribution of the ciphertext made it explicitly unbreakable).

If you’ve read Ratcliff’s “Delusions of Intelligence” (a book the GCHQ Historian recommended I read, thanks for that!), you’ll know that this last mindset was precisely what the various German agencies using the Enigma machine suffered from: and if Chaocipher had been extensively used by the Allies in WW2, who’s to say that Hitler’s fragmented array of codebreaking agencies wouldn’t have eventually found a way of breaking into it, just as they did with virtually all the Allies’ low-to-medium-echelon ciphers?

One thing that strikes me most about the whole saga is that even though Byrne (who sometimes wrote under the anagrammatic pseudonym “J. F. Renby”, I was amused to see) seems to have envisaged Chaocipher as an expensive-to-build set of mechanical rotors, I think it is actually very easy to use with two Scrabble alphabets arranged in horizontal rows. (OK, Scrabble wasn’t devised until the 1930s, but my basic point still stands regardless). All the sliding operations (zenith / nadir, etc) then become immediately straightforward, arguably far more so than if you were using a machine to do the same.

Regardless of whether or not Scrabble tiles are the best way to Chaocipherify your plaintext, I’d argue that what sets Byrne’s cryptographic ideas apart most is the way he conceptualized his crypto system in terms that mesh peculiarly well with modern computer science: in fact, it’s quite hard to describe it at all without lapsing into contemporary CompSciSpeak. It’s almost as if Byrne were projecting himself forward into a software world: but then again, one of the chapters of his autobiography was SciFi, so perhaps the future was where he felt most at home! 🙂

If you have been following the coverage here of the recent WW2 cipher pigeon story with more than the bleariest of eyes, you’ll know that I’ve repeatedly speculated whether its “W Stot Sjt” signature might well have actually been written by Serjeant William Stout of the Royal Engineers. Though (as we’ve already seen) he died not long after D-Day, I wondered whether it might be possible to find out more about his story by tracking down surviving members of his family and asking them.

Just before Christmas, I finally managed to get in contact with Stout’s daughter, and asked if she could see if she had a copy of his signature or his handwriting. Delightfully, I received from her this last week a small package containing some wartime photographs of her father, a photograph of his grave taken in 1948, and – most surprisingly of all – a 1940 field service post card (“Army Form A. 2042 / R.A.F. Form No. 1929”). Such postcards contained a list of barely informative sentences (“I am quite well”, etc), out of which the sender crossed all those lines that did not apply: there’s an example online here.

Aha, I thought: will the signature pencilled on it turn out to match the signature on the pigeon cipher form? After some lightweight image processing, I placed the two side by side so as to compare them as reliably as I could…

w-stout-signature-comparison-small

You’ve worked out the answer already, I think: which is that the two names were clearly not written by the same person. Which is a shame: but despite not being a proof, it’s still very far from a disproof. In the busy fog of war, a message could easily have been written by one person (the sender), enciphered and/or copied by a second (the signaller), and then sent by pigeon by a third (the pigeon handler).

In fact, various historians have already commented to me that they thought it quite unlikely that a Serjeant in the R.E. would have had the responsibility (or even the practical means) for enciphering a message in the field. So the fact that our enciphered pigeon message was not written by Serjeant Stout might arguably make more sense than if it had been… but it’s hard to be sure either way.

All the same, it has to be said that the best cipher mysteries tend to yield their secrets slowly (at best): so perhaps we shall have to resign ourselves to waiting a little longer yet for a pigeony breakthrough… we shall see!

Despite The Dorabella Cipher‘s brevity, its link to composer Sir Edward Elgar (who wrote it) has brought it a cult following over the years. Like other unbroken ciphers, it has appeared as a mysterious motif in TV plays, novels, and even recently in a children’s book (The Orphan of the Flames).

dorabella-cipher-image

At first sight, it looks to be merely a straightforward simple substitution cipher of the kind that pen, paper, and an agile mind should crack relatively quickly. But what is mystifying is that even though Elgar apparently used precisely the same pigpen-like (3 sets of 8 orientations each) cipher alphabet elsewhere in his writings and notes, the letter-for-symbol replacements he used there make no sense when applied to his Dorabella Cipher. The key seems to match the lock, but doesn’t open the gate.

Moreover, given that the ciphertext’s statistical distribution sits awkwardly with those of natural languages, code-breakers’ numerous attempts to shoehorn their preferred substitutions into the cipher’s three short lines come across as clunky and false (at best). Worst of all, I’m sorry to say that even prolific cipher-solver Tony Gaffney’s ingenious and elegantly-structured decryption failed to please pretty much anyone apart from him.

However, the upside to all that grim cryptanalysis is the indisputable truth that Elgar messed around with language quite a lot, typically in a playful and mischievous way. In general, he loved subverting the rules of language, speech and music, which arguably culminated in his famous Enigma Variations, which some people like to call ‘musical cryptograms’ because many lightly parody (for example) various close friends’ speech and laughter rhythms.

Yet what has long tipped my own judgment against the Dorabella Cipher’s being a cipher of any sort is that by 14th July 1897 (the date of the note), Elgar (who wrote the note) hadn’t known Dora Penny (to whom or for whom the note was written) very long at all; and they never communicated in any kind of cipher before or after that date. But even so, my opinion was no more than a hunch, based only on various modern references on Elgar’s life I’d read… not very satisfactory, but that’s how these things tend to go.

Anyway, having spent far too long reading and relying on secondary sources on this particular cipher mystery, a few weeks ago I decided to instead go right to the source of the story – Dora Penny’s book “Edward Elgar: Memories of a Variation” (I bought a copy of the 1946 second edition, which has rather more information about the Enigma Variations than the first edition), written under her married name “Mrs Richard Powell”.

What I read there only served to strengthen my historical argument against The Dorabella Cipher’s being a cipher at all. Elgar and Penny first met on 6th December 1895, and the cipher was only the third letter Elgar ever wrote to Dora (if indeed, as she points out, it is a letter at all). (Also, he only started calling her “Dorabella” in 1898, so there’s a case to be made that its name isn’t chronologically accurate… oh well!) From all I could see, it would defy common sense if he had sent her something written in an deliberately intractable cipher: no matter how much of a fascination he personally had with such things, cryptography of any sort was not a discussion subject the two friends seemed to have shared at all.

And yet what we see does so resemble an enciphered cryptogram, a paradox which ultimately gives it its place at the Cipher Mysteries top table: for it really ought to be a simple cipher, but it surely is not one. And I find it hard not to hear Elgar’s voice saying to Dora Penny exactly what he said to her about the Enigma Variations (one of which is ‘hers’) – that surely she “of all people” would be able to unwrap its central mystery, its hidden themes. Wouldn’t his cipher, too, be steganography – hidden in plain sight?

As to the content of the note, I don’t believe that the newly married Elgar would have sent Dora Penny, for all the fun they had together (going out to the races, seeing Wolverhampton Wanderers, reading maps, flying kites, etc) a love letter. So in all probability, I think that what we are looking at here is a three line note or letter from him to her, in broadly the same joking and playful manner that he adopted in his other letters to her (though probably not as Byzantine in lexicographical complexity as later letters would become), regardless of the particular manner in which that effect is achieved.

The only other clue I have to offer is that in July 1897, the Elgars were living in a house called “Forli” (named after the talented Renaissance painter Melozzo da Forli, who incidentally gets mentioned a few times in Elizabeth Lev’s rather good The Tigress of Forli) in Malvern in Worcestershire. And so I wondered whether “Forli” and/or “Malvern” might be effective as cribs into the cryptogram, for Elgar would typically head even very short notes with his current address (several of which are charmingly reproduced as inserts in Dora Penny’s book). OK, it’s not quite “HEILH ITLER” at the start of Enigma messages, but you gotta work with what you’ve got, right? 🙂

And so with all these fragmentary clues in mind, I stared and stared and stared at the Dorabella Cipher, trying to see what Elgar (mistakenly) thought Dora Penny would see straight away. And then I stared somemore. After a (fairly long) while, here’s what I noticed:-

dorabella-forli-malvern

Essentially, I suspect that Elgar was so certain that Dora Penny would know what he would be saying in a short note that all he felt he needed to do was to write the general form of the words (even presented in the form of a ciphertext-like medium) and she would still be able to ‘read’ them. [Unfortunately, this proved not to be true!] So, I believe that what we are looking at could well be more like Elgar’s improvised steganographic attempt at a mind-reading trick than a traditional ciphertext per se. Such a process would (probably) produce something like what we see: a non-mathematical stegotext that fails to have the kind of rigorous statistical profile that “proper” ciphers would.

I’m the first to admit that it’s far more of a wobbly observation and a loose speculation than a rigorous proof: but what I’m proposing is that the Dorabella Cipher could turn out to be a quite different class of object from that which code-breakers have been trying (unsuccessfully) to crack. It’s not the end of the road here, but it might possibly be the very start of one… hopefully we shall see! 🙂

Several years ago, I noted here a long-standing story about a 1926 Budapest waiter who (allegedly) killed himself, leaving a suicide note in the form of a crossword. I wondered whether it was an urban legend, or (if it were to prove to be true) whether the crossword might have been printed in a newspaper of the day. But with only a few words of tourist Hungarian to work with, I didn’t really stand a chance in the Hungarian archives.

Well, now Hungarian urban legend-hunter Marinov Iván has eagerly grabbed the baton, and hurdled his way along miles of microfilm in the newspaper archives in search of the truth. As a result, his Hungarian urban legend blog today revealed that this was indeed a real story. According to the 4th March 1926 edition of Az Est, what happened was this (forgive my rough and ready translation / paraphrasing)…

keresztrejtveny

Just after (?) midnight, a man had come into the well-known Emke kávéház [Café Emke] on near the corner of Rákóczi út [Rákóczi Way] and Erzsébet körút [Elizabeth Boulevard]. After having a coffee, he repeatedly tried to call a number using the cafe’s telephone, but without success. About an hour later, the Emke’s cloakroom attendant heard a bang from a toilet: and when she opened the door, she heard a second bang. Inside, she found a young man lying on the floor with a pistol in his hand, and with blood gushing from his head and chest.

Emke1929-interior-small

Once the ambulance and police arrived, the man’s identity was found to be Antal Gyula [Julius Anthony] of Csengery utca 3 [#3 Csengery Street]. In his pocket there was [- indeed! -] a suicide note containing a crossword. It subsequently turned out that he had lived in “misery and unemployment” for some time, and had been evicted from his apartment at the start of the month, having failed to pay his rent. But as far as his note went, the Est article concluded “A bonyolult keresztrejtvényt azonban eddig még nem sikerült megfejteni“, which I read as “the complexity of the crossword means that it has not yet been deciphered“.

emke-kavehaz-small

So… what happened next? Iván followed up by looking in lots of other Hungarian newspapers from that year, but they all reported essentially the same bare facts, with only the Pest Newsletter adding that the man was 25 years old, and that the riddle had been “taken to police committee headquarters”. He speculates that had it might have had received more coverage had the man’s job been of higher status than a waiter: sadly, Budapest has long been (and remains to the present day, I believe) a suicide ‘hotspot’, so many other pages of those same newspapers would have contained stories of the same tragic ilk.

Ultimately, Iván failed to find any further references to the story in the newspaper archives, and so it is there that he stopped. Perhaps someone else will now pick up this baton and carry it yet further… perhaps we shall yet get to see Antal’s infamous (but tragically real) crossword!

PS: an Internet search revealed an evocative description of Café Emke in December 1945 in Sándor Márai’s autobiographical “Memoir of Hungary (1944-1948)” (pp.198-205).

Today I received a nice little package of stuff from Holland, courtesy of Rob van Meel, who reprints old military manuals – mostly British, but a few American and German ones too. I get the impression these are mainly for people with an interest in reenactment / war games rather than historians and researchers per se, but given a healthy area of overlap there’s surely room for everyone at the table. 😉

Unsurprisingly, I was most interested in the various Slidex-related manuals Rob had, particularly an updated release of the Slidex manual dated 1st December 1944 (i.e. six months after D-Day). You see, Slidex originated as a system where operators used only a single letter for each of the twelve slots on the horizontal cursor: yet we have later examples where two letters went in each slot (and you could choose either one to signify that column).

If our pigeon cipher is a bigram cipher, then it is one that appears to use 24 letters in its horizontal cursor. So if it was enciphered using Slidex (which seems to be the code most widely used on D-Day), it would have to have used the two-letters-per-slot version. Hence the big question I wanted to try to answer was… when did the changeover from one-letter-per-slot to two-letters-per-slot Slidex happen?

However, going through the revised Slidex manual, it became abundantly clear to me that even in December 1944, the British Armed Forces were still using single-letter-per-slot Slidex, which would seem to rule out Slidex’s having been used in the pigeon cipher before 1945.

At the same time, the two pigeons were (according to their NURP references) born in 1937 and 1940: and the older of the two would have been right at the end of its carrying days in summer 1944, let alone in 1945. As a result, the Venn diagrammatic intersection of possibility (i.e. between the [old pigeon] circle and the [revised Slidex] circle) is shrinking all the time.

Right now, I don’t know what the answer to all this is: to my eyes, what we’re looking at seems a bit more like a bigram cipher than a machine cipher, but even that’s far from certain either way. All the ‘best’ cipher mysteries seem to take a somewhat sadistic pleasure in continuously oscillating either side of the shaky line between certain and uncertain, and this one is surely no exception.

Yet there were other low grade bigram ciphers in use during WW2: two in particular were an Air Support bigram cipher and a Royal Engineers syllabic cipher. These may well be the same two variants of the Syllabic Cipher introduced in 1942 as per Stu Rutter’s page, which I believe were known as BX 724 and BX 724/RE respectively.

I’ve already written to several army museums and archives asking if they have either of these, but so far without any luck. Any suggestions as to private collectors (or collections) who may have a copy of either? Unless you have a better idea, this would seem to be the next sensible thing to check, and the various National Archives files Stu & I checked didn’t seem to have any description of it at all.

In short: probably not Slidex, so remains a work in progress. 😉

As quite a few of you already know (because you emailed to tell me, thanks!) Cipher Mysteries’ WordPress hosting got hacked again. Unfortunately by the time I’d downloaded the access logs from the server (the next day), all the nasty activity was too far back in the buffer to see exactly where it came from. Next time I’ll try to remember to be quicker!

I first had a look around with the Cpanel File Manager, as I initially expected the attack to have originated from a compromised file in the file system. I did find a backdoor php file inserted into ./wp-content/uploads, which from the file date was probably left there by the previous (Bangladeshi) hacker: but nothing else, which was a bit strange. So I reinstalled WordPress 3.5.1, fired it up, and… it was still hacked.

Appallingly, it turned out that the hacker had managed – despite my firewall & security plugins – to change some fields in the local database itself. Basically, he (I’ll call him “him”, for I’ve read that hacking is a largely male subculture) changed three entries in the WordPress wp_options table:-

1. blog_charset (which he changed from “UTF-8” to “UTF-7”)
2. blogname (which he overwrote with a load of script kiddie stuff)
3. widget_text (which was filled with a load of escaped script kiddie stuff)

The most irritating hack was #3, as I could tell it was in JavaScript (hint: disable JavaScript and the problem disappeared) but couldn’t see what file had been changed. And in fact none had, because the script was inserted into a field in the database.

The most interesting hack was #1, because it wasn’t at all obvious to me why changing the charset to UTF-7 would be of benefit. But it turns out that this is a longstanding way of attacking databases (which expect UTF-8, and can be vulnerable to carefully crafted UTF-7 strings causing mySQL to do unexpected things). Here’s a page mentioning this weakness. Just so you know, IE9 doesn’t seem to support UTF-7 satisfactorily, which also had me confused for a while. *sigh*

The hacker may also have made other changes to the database, but I don’t know of any way to see a history of recent mySQL accesses from within WordPress… now there’s an idea for a forensic plugin that would be really useful. Or a Cpanel add-on. Or something.

How did the hacker get in? My guess is by exploiting a just-after-zero-day vulnerability in WordPress 3.5.0, as I hadn’t quite got round to upgrading to 3.5.1, what with work and real life inevitably getting in the way.

Unfortunately, I have no real faith that I’ve solved the problem. Chances are another vulnerability will open up before very long and we’ll go through the same rubbishy process all over again. C’est la vie (du blogging).

It’s been a while, but the time has finally come round for another Voynich London pub meet, on Thursday 7th March 2013 at the Prospect of Whitby in Wapping, a pub with its own gallows and noose (though admittedly these days it’s Somali pirates who get all the press rather than privateers). I’ll be there from 6pm onwards, hope to see some of you there too!

prospect-of-whitby

The reason for the weekday (i.e. not the usual Sunday) is that German cipher mystery skeptic Klaus Schmeh is over in the UK for a very few days & the 7th is the only evening he can squeeze into his packed schedule. I can’t change that and would like to catch up with him, so what’s a Cipher Mysteries blogger to do? Make do with the cards he’s dealt, that’s what… it is what it is.

This has, of course, been Schmeh Week on Cipher Mysteries, what with The Gentlemen’s Cipher from Klaus’ blog and this week’s diplomatic cipher conference in Gotha. So if (like me) you’d like to chat with Klaus about the conference, or perhaps chat with me about cipher stuff (if reading all my posts isn’t a rich enough diet for you), then feel free to swing along to Wapping. WW2 cipher pigeon fans welcome too! Cheers! 🙂

Word just arrived here by breathless carrier pigeon (well, the little chap had flown from Italy, after all) about a conference in Gotha on 14th-16th February 2013 (yes, this very week!) on historical cryptography. And here’s the rather nice conference artwork:-

gotha-conference-small

Lots of interesting sessions on all manner of European historical stuff, such as from top Italian cipher-breaker Filippo Sinagra, who you may remember from the Nat Geo “Ancient X-Files” Voynich half-episode not so long ago. Filippo’s talk is on Sforza-era cryptography, for which he patiently trawled through the Milanese archives (he very kindly passed me scans a while back). Fascinating stuff that’s right up my street, I just wish I could be there (though sadly that’s not possible this particular time, oh well!)

There’s also a talk on historical code-breaking methodology (or, more accurately, the apparent lack of anything like one) being given by a certain German cipher skeptic – yes, it is indeed that man again, Klaus Schmeh. His introduction notes that: “Publications devoted to the particular methods of cryptanalysis regarding historical ciphers are rare. The presence of numerous works in which decrypted historic secret texts are presented should not obscure the fact that a comprehensive theory is lacking in this area.” Amen to that, Brother Klaus!

Another nice thing is that Christiane Schaefer will be discussing the Copiale Cipher (which she, Beáta Megyesi and Kevin Knight successfully broke) and her team’s follow-on project, an “interactive digital platform” called “CADMUS” for early modern cryptology. All of which sounds a lot like a call to European historians to send them your enciphered early modern documents and they’ll crack ’em… or you could just send them to me c/o Cipher Mysteries, that would work too. 😉

But perhaps the wild-card of the conference will prove to be Dr. Michael Korey, whose session is intriguingly entitled “Hidden steganography and burned substitution. Some little‐known cypher equipment from the cabinet of curiosities (Kunstkammer) of Dresden”. The description (which I hope it’s ok to reproduce here) goes like this:-

“In the middle of the 16th century, the German territories were considered to be not very progressive by foreign countries in regard to their ability to encrypt their messages or decrypt those sent by others. Matteo Argenti, secretary of cyphers at the Vatican, said the Germans and their neighbours understood so little about cyphers that they preferred to shred and burn the encoded dispatches they received instead of trying to decrypt them. In retrospect, this assessment seems quite premature, considering two pieces coming from the electoral Saxon cabinet of curiosities (Kunstkammer), that so far have not attracted much attention and will be presented here.”

It should be no surprise that I rather like the sound of that. Ummm… I hope someone takes a photo or two (hint, Klaus, *cough*). 🙂

For everyone going, I hope you have a great time, it looks like a great conference!