I posted up seven homophonic challenge ciphers a few days ago, and now – though it may sound a little counter-intuitive – I’d like to try to help you solve them (bear in mind I don’t know if they can be solved, but the whole point of the challenge is to find out).
Of the seven ciphers, #1 is the longest (and hence probably the easiest). Reformatted for ten columns rather than five (it uses five cycling alphabets ABCDE, ie. “ABCDE ABCDE” over ten columns):
121,213,310,406,516, 108,200,323,416,513,
112,208,308,409,515, 102,216,309,425,509,
114,215,309,417,507, 102,201,323,401,517,
111,200,306,408,500, 113,203,313,407,512,
103,223,313,403,511, 119,213,316,416,511,
102,204,324,418,517, 120,203,324,407,516,
105,209,312,401,504, 117,208,310,408,500,
113,203,301,425,513, 115,201,313,408,515,
115,214,308,406,501, 122,204,322,408,509,
114,209,305,412,504, 117,213,316,402,509,
100,200,310,423,513, 100,214,320,419,509,
114,209,309,419,520, 101,200,320,416,518,
120,211,313,403,509, 103,207,313,421,513,
107,209,305,407,523, 115,224,313,416,508,
102,203,306,416,514, 107,200,310,401,509,
103,212,324,
Repeated Quadgram
Commenter Jarlve (whose interesting work on the Zodiac Killer ciphers some here may already know) noted that there is a repeated quadgram here, i.e. the sequence 408 500 113 203 appears twice.
This is entirely true, and also a very sensible starting point: I’ve highlighted this quadgram in the following diagram, along with all other repeated A-alphabet tokens (i.e. 100..125), and also any tokens they touch more than once (i.e. in the B and E alphabets):
Another thing that’s interesting here is that the 102 token (that appears four times and is coloured purple in the above) appears with four different letters before it as well as four different letters after it. In classical cryptology, that’s normally taken as a strong indicator that this is a vowel: and with the high instance count (4 out of 31, i.e. 12.9%), you might reasonably predict that this is E, A, O, or perhaps I (in order of decreasing likelihood).
[Note that I haven’t looked to check what letter this actually is: having created the challenge ciphers, I’ve just left them to one side, and don’t intend to look again at them.]
Similarly, the 114 token (that appears three times and is coloured green) is always preceded by 509, and is followed by 209 on two of the three instances. (Note that the token two after it is 309 in two of the three instances as well.) Again, in classical cryptology, these kind of structured contacts are normally taken as strong indicators that this token enciphers a consonant: and with the high instance count (3 out of 31, i.e. 9.7%), you might reasonably predict that this enciphers T or possibly N, S, or H.
With these two examples in mind, it strikes me that for any given plaintext language (English in the case of these challenge ciphers) you could easily build up probability tables for repetitions of the two tokens before and the two tokens after any given token: and then use those as a basis to predict (for a given ciphertext length) which plaintext letter they imply the letter is likely to be.
Though this may not sound like very much, because you can do this for all five of the alphabets independently, the results kind of rake across the ciphertext, yielding a grid of probabilistic clues that some clever person might well use as a basis for working towards the plaintext in ways that wouldn’t possible with randomly-chosen homophonic ciphers. Just sayin’. 😉
And The Point Is…
It’s entirely true that for homophonic ciphers where each individual cipher is chosen at random, the difficulty of solving a reasonably short cipher with five homophones per letter would be very high. But knowing (as here) that each column is strictly limited to a given sub-alphabet, my point is that many of the tips and tricks of classical cryptology are also available to us, albeit in slightly different forms from normal.
Yet while it’s encouraging for solvers that there is a repeated quadgram here, I don’t currently believe that cipher #1 will be (quite) solvable with pencil and paper, as if it were a Sudoku extra-extra-hard puzzle (though as always, I’d be more than delighted to be proved wrong).
However, my hunch remains that strictly cycling homophonic ciphers may well prove to be surprisingly solvable using deviousness and computer assistance, and I look forward very much to seeing how they fare. 🙂
Reading Comprehension fail – I must have misunderstood your previous post – isn’t this sort of a Quagmire cipher (ie a Vignere with no ‘key’ per se’)?
So only the length is killing us? Surely a frequency analysis (firstly of groups of letters to guess a key length (yes I know we know it’s 5)) and then one on that actual (theorized) key size would put us in a neat place to do a frequency analysis….
Milongal: it’s not a Vigenere because each of the five alphabets are mapped differently. What’s interesting is that nobody seems to have tried to analyze this cipher system, because picking homophones at random is so much stronger. But… here we are with the Scorpion Ciphers. 🙂
Nick: well, it is. Its a polyalphabetic cipher with five cycling monoalphabetic substitution ciphers. You’ve just assigned visually different alphabets to each.
The alphabets are scrambled, which makes it harder, but letter frequency statistics still apply within each column.
SirHubert: I’ve only presented the ciphers in that way so that there is absolutely no question about what is going on inside. There is no doubt that having five strictly cycling alphabets is significantly harder than a single alphabet, but it’s also far easier than situations where each homophone instance is chosen randomly from a set of five. Hence the reason for the challenge is to help find out exactly where in the middle it lies, because I don’t know of any cryptanalysis of this specific kind of cipher.
NP: It’s not Vignere in so far as each alphabet isn’t a shift, but isn’t it still related – or at least based on the same principle (I had an idea there was a group of ciphers called “Quagmire” that were based on this sort multiple-alphabet encryption – and I thought there had been considerable analysis on them, and there were tools to try to analyse and crack them).
There’s a (free) Security Engineering course offered by UNSW on openlearning (google: “UNSW open Learning Security” shouild be the first result) that had a similar challenge (albeit a lot longer). (If you sign up free, go to “Lectures and Activities”, “Module 3”, “Cipher Challenge 3”.
It took me (and by the look of it most others) a LONG time to solve, but given the number of people who have solved it, it was by no means impossible (without knowing the number of keys etc – in fact there were no clues (formally, actually there was a big clue to the content, but I’m not sure it was noticed by most people until afterward), just a ciphertext). Granted, the text is SIGNIFICANTLY longer, but I think most people got it with a combination of analysis (looking for repeated Di/Tri/+ graphs and trying to work out a likely key space, then frequency analysis within each interval and finally guessing words (some with automated dictionary brute forcing, I think)….
And I think many (most even?) of the people who managed to solve it would have had a limited crypto background – if any.
I like 509 as T.
In fact I’m going to go out on a limb and say:
509 = ‘T’, 114 = ‘H’, 209 = ‘E’, 309=’S’, 215 = ‘I’, 419 is ‘E’
313 occurs a lot and might be another E….but I’ll reserve judgement for the sec
115 occurs 3 times each time followed by a different char, but twice has 313 2 char later
Guesswork and gut feel mainly. Might have a deeper look at it when I have time…..it interests me, but I’m time poor.
Helen Fouche Gaines – Elementary Cryptanalysis (c)1939, Chapter XVIII Periodic Ciphers with Mixed Alphabets. Still difficult, but some good tricks 🙂
Marie: fantastic, thanks! Could I possibly ask you for a scan of the chapter that I can summarize as a blog post? I’d like to give people the best chance of winning my money. 🙂
Marie: pdf downloadable here: http://informatika.stei.itb.ac.id/~rinaldi.munir/Kriptografi/2010-2011/cryptanalysis.pdf
Alas, Gaines restricts her analysis (if I understand it correctly) to mixed alphabets where the alphabets are derived from a single alphabet, i.e “sliders”, because “It is very seldom indeed that a series of cipher alphabets used in the same cryptogram will be unrelated alphabets. Nearly always, they will have resulted from the use of a slide“. (p.175)
Still, her general points about solving ciphers in Chapter IX (that she briefly reprises in this chapter) offer a very sound set of steps to be taking here, so I’ll write them up from that viewpoint, thanks! 🙂
Sorry, meant to respond sooner. I had found it online about a year ago and was going to send you the whole thing but it was the holidays here in the States and I got distracted, so you beat me to it! The book is complicated at times but a general fount of knowledge on all things cipher, with fun examples to work through. I had noticed the slide when I reread the polyalphabetic chapter, but there are still hints there and in many chapters as you’ve seen. A good example and one of my favorites is the Nihilist Transposition Chapter IV, both from my thoughts on D’Agapeyeff’s cipher (where she does, in fact, mention it being the reverse of a typical transposition though I would still love to find the original source of that information, but that is a post for another thread) and a hint on average vowel usage….
Marie: 🙂
Right now I’m re-reading Friedman’s guide to solving simple substitution. ciphers, lots of good stuff in there too. 🙂
Thanks for the challenge, Nick. I am using a self-written hill climbing code with an n-gram scorer that worked perfectly many times before on simple substitutions and also Vigenere / Quagmire type of ciphers. I adapted it to take turns, optimizing the 5 alphabets with some randomness in order not to get stuck in local maxima.
I have the strong feeling by now that this approach will not be very successful as the 5 alphabets leave MANY degrees of freedom. I usually end up after a few seconds with a “solution” that is a collection of “english” words that just don’t make sense together (which is not so easy to find out by software). E.g. right now for #3: “WAS THE RING A CON THE PRINDER STORE COLD THESE RALLASED THEN TROUPS IN OVE OFF THE TERS ARENT I NOTED TO HAND THEM”. The shortness of the last messages especially might leave room for “alternative solutions” that fit the code constraints but might not be Nick’s original text. If I find a good one, I’ll post it.
With the Caveat that my projects never last (a day is lucky, a week would be close to the max), I’ve thought about several approaches.
The one I’ve settled on for today is a genetic algorithm whose health is determined by variance from the digraph frequencies – but given the short cipher lengths I’m guessing the results (of we get that far) will be a bit……(can’t think how to finish that sentence, but it’s not positive).
I did sort of consider a similar approach but with Index of Coincidence as the health measure, and then trying to solve it as a vanilla substitution cipher (and we might revert to that yet).
I’m going to do it in C (because that has always been my language of choice (and claimed expertise)), but I’m already struggling because less than 6 months of ruby has destroyed all my pedantry about semi-colons and the like (and maybe to be fair a year of shell scripting and php/javascript/crap before that has probably destroyed other C habits) ….
Milongal: the problem is that it’s not a ciphertext in that kind of way. There are plenty of programmes that will do all the heavy lifting for you, such as CryptoCrack: personally, I wouldn’t write a line of code for any cipher mystery (and Ruby isn’t as bad a language as you seem to think) until I’d exhausted all the basic tests in CryptoCrack first.
Milongal: incidentally, something few non-codebreakers tend to appreciate is that programmes such as CryptoCrack are really very good indeed at cracking simple substitution ciphers. Even though these have a unicity (minimum theoretical ciphertext size for a unique answer) of 28 characters, in practice simple ciphertexts of length 40 or so will yield to CryptoCrack’s devious chiselling, and even length 30 ciphertexts can often yield automated cracks that are obviously very close.
With a length of 46, the Rubaiyat ciphertext is therefore well inside the zone where you would expect CryptoCrack to have a pretty good attempt… but it finds nothing. 🙁
Further to my post from June 29 (about the Sec Engineering course through UNSW), the ciphertext provided was quite big, and all CryptoCrack (**might have been something equivalent, I remember having a bit of issues with some of the tools I downloaded, and some of the analysis was done on random websites too) really help with was likely key size (I can’t remember whether it got the actual length or some multiple/divisor of it) and its scoring of likely algorithms voted variants of the Quagmire highest. Perhaps the problem was that I didn’t know how to use it beyond that, but the bulk of the ‘heavy lifting’ (certainly when I solved it, and by the sound of it others who I shared ideas with) was done in part programmatically and in part pen and paper.
Given that you’ve told us a lot of the technical detail (5 random alphabets) it feels like we’re in the same place we were with the UNSW cipher (although I havebn’t even played with CryptoCrack this time, so it might be worth a look….). The biggest challenge we had last time with the programmatic approach was the decision that something was or wasn’t English text- and I’m a bit uncomfortable that mere digraph analysis might not be enough…but I don’t know until I try.
To be perfectly honest, I kind of feel this should be crackable simply by applying ETAIONSHRDU to the commonest letters in each ‘alphabet’ (or whatever the exact order is) and tweaking the result – and in that case it’s doable by hand by anyone with a few days patience (I get frustrated with that approach easily). I’m sure you understand the method…by applying E and T you start to be able to guess ‘THE’ (and ‘THERE’/’THESE’) and then TH + Constonant possibly indicates something ending with ‘TH’ (eg wiTH) is and you look for patterns with ‘N’ that can be [AE]N[DT] etc…..I guess it’s like solving a multi-dimensional acrostic (not sure that’s the right name) puzzles….
But will see…as I say, my attention span isn’t all that long.