While thinking about the Scorpion S1 unsolved cipher in the last few days, it struck me that it seemed to be a special kind of homophonic cipher, one where the homophones are used in rigid groups.
That is: whereas the Zodiac Killer’s Z408 cipher cycled (mostly but not always) between sets of homophones by their appearance, it appears that the Scorpion S5 cipher maker instead rigidly cycled between 16 sets of homophones by column. What’s interesting about both cases is that the use pattern gives solvers extra information beyond that which they would have for a homophonic cipher where each homophone instance was chosen completely at random.
Perhaps there’s already a special name for this: but (for now) what I’m calling them is “constrained homophonic ciphers“, insofar as they are homophonic ciphers but where an additional use pattern constrains the specific way that the homophones are chosen.
The question I immediately wanted to know the answer to was this: can we solve these? And what better way to find this out than by issuing a challenge!
Seven Challenge Ciphers
The seven challenge ciphers are downloadable as a single zip file here, or as seven individual CSV files here:
* #1
* #2
* #3
* #4
* #5
* #6
* #7
How The Ciphers Were Made
Unlike normal challenge ciphers, what I’m giving you here (in line with Kerkhoffs’ Principle) is complete disclosure of the cipher system and even the plaintext language.
The cipher system used here is a homophonic cipher with exactly five possible homophones for each plaintext letter BUT where the homophones are strictly selected according to the column number in which they appear in the ciphertext. Each separate CSV uses its own individual key.
The plaintext language is English: they are straightforward sentences taken from a variety of books, and without any sadistic linguistic tricks (i.e. no “SEPIA AARDVARK” or similar to confuse the issue).
The enciphered files are simple CSV (comma-separated values) text files, arranged in rows of five letters at a time, but encoded as decimal numbers. For example, the first (and the longest) challenge cipher (“test1.csv”) begins as follows:
121,213,310,406,516,
108,200,323,416,513,
112,208,308,409,515,
…
Here, “121,213,310,406,516,” enciphers plaintext letters #1..#5, “108,200,323,416,513,” enciphers plaintext letters #6..#10, and so forth. The first column is numbered in the range 100..125 (i.e. these belong to the 1st homophonic alphabet), the second column 200..225 (i.e. these belong to the 2nd homophonic alphabet), and so forth.
The start of the message and the end of the message are exactly as you would expect: there is no padding at either end, no embedded key information, just pure ciphertext.
The Rules
Treating this as a massively parallel book search using cloud databases (a) will be treated as cheating, and (b) will spoil it for other people, so please don’t do that. This challenge is purely about finding the limits of cryptanalysis, not about grandstanding with Big Data.
Hence you’ll need to also tell me (broadly) what you did in order to rise to the challenge, so that I can be sure you haven’t solved it through secondary or underhand means.
The Prize
If nobody solves any of the challenge ciphers by the end of 2017, my wallet stays shut.
However, the person (or indeed group) who has the most success decrypting any of these seven challenge ciphers by 31st December 2017 will be the “2017 Cipher Mysteries Cipher Champion“, and will also receive a shockingly generous £10 prize (sent anywhere in the world where PayPal can send money) to spend as they wish.
In the case of multiple entrants solving the same difficulty cipher independently, I’ll award the prize to the first to contact me. In all cases, please leave a comment below.
In all situations, my decision is final, absolute, arbitrary and there is no opportunity for appeal. Just so you know.
PS: any individual (or indeed covert agency) wishing to donate more money to increase the prize fund (i.e. to make a little more cryptanalytic sport of this), please feel free to email me.
Hints and Tips
I suspect that the multiplicity (i.e. the number of different symbols used divided by the length of the ciphertext) will prove to be too high and the ciphertext lengths too short for conventional homophonic decryption programmes, so I expect prospective solvers won’t be able to look to these for any great help.
Similarly, I don’t believe that numerical brute force and/or parallel processing will be sufficient here: all the same, these challenges (if solvable) will probably prove to be things that anyone anywhere can tackle (e.g. through hill-climbing and cleverly exploiting the constraints), not just the NSA, GCHQ or similar with their supercomputers.
For what it’s worth, my best guess right now is that #1 (the longest of the seven ciphertexts) will prove to be solvable… though only just. Even so, I’d be delighted to be proved wrong for any of the others.
Incidentally, I chose the length of the very shortest challenge cipher to broadly match the length of the Scorpion S1 cipher: so even in the (perhaps unlikely) case where all seven of my challenge ciphers get solved, there’ll still be an eighth challenge to direct your clever efforts at. 😉
Nick, I have been following your blogs with interest but as a newbie who is severely challenged by the daily cryptogram could you suggest a list of references to start me learning the more sophisticated methods of cryptography?
Don Simpson: it’s a good question. Books tend to either steer clear completely of the interesting stuff or dive so deeply into it that your eardrums burst. I’ll have a look around, see if there’s something I could recommend.
@Don
Friedrich L. Bauer
Decrypted Secrets
4th edition
ISBN-10 3-540-24502-2
ISBN-13 978-3-540-24502-5
Ravenhurst: while it’s a good book, few will want to pay upwards of £100 for the (latest) 4th edition. Having said that, earlier editions are available via bookfinder.com for rather less, and that’s probably where I guess the best compromise lies. 🙂
The Code Book by Simon Singh would be my recommendation as a good starting point to learn.
Also – I hate to admit this and I’m probably being very dim but I’m not sure I understand the layout of the ciphers.
Is it
A
101
102
103
104
105
B
106
107
108 etc
or
A
101
126
and so on?
Matt: The Code Book is a comfortable read, but I’m not sure it’s where I’d point someone first.
The alphabets are interleaved, i.e. ABCDEABCDEABCDE etc.
Hello Nick!
Can you guide us a little more? Does the text contain long sentences or is it a mixture of short words? I believe to read two or three words of 5 letters in Test 1, am I on the right track or should I give up?
Best regards
Ruby
Ruby: all I’m saying is that I tried to be fair, in that I didn’t select passages that were completely artificial, or arbitrary, or mad, or oddly-structured, or random-looking.
The general problem when trying to crack homophonic ciphertexts is that you have so many degrees of freedom (particularly with symbols that only appear once) that it’s easy to devise texts that genuinely do fit short stretches of it. However, the problems then come when you try to follow that same (narrow) road for the rest of the ciphertext… as per just about every attempted decryption of the unsolved Zodiac Killer Ciphers. :-/
This is nearly impossible, there are millions of possible cipher combinations. The hardest part is that after the first letter, lots of information is obscured. The position of letters in words, most notably.
All I can think of is to count the most frequent numbers in each cipher and assign to them the most frequent letters in English. Are spaces encoded? That might be a key. Or is it just one string of words without spaces? In that case, I say it’s impossible
Unless one can write a computer program that generates all solutions using all possible alphabet combinations, and have it filter out those strings which contain real words.
One could try stacking wordslides and cribs, there is a repeating quadgram in the first cipher: 408 500 113 203.
Jarlve: can you explain how that might work? If it’s any help, I’m happy to host guest posts on code-breaking, rather than squeeze everything into the small margins of comments. 🙂
“Thoughts on Nick’s Challenge Cipher #1″…
http://ciphermysteries.com/2017/06/28/thoughts-nicks-challenge-cipher-1
“Each separate CSV uses its own individual key.” Does that Imply that a key of some kind involved in generating each homophonic alphabet?
There are techniques for solving this type of cipher (as I’m sure you know). It’s the relative shortness of the ciphertext which poses the main challenge…
SirHubert: my PC shuffled each key thousands of times, so there’s no key phrase to search for, if that’s any help.
And yes, I’d agree that this kind of ciphertext is straightforward to solve with only a single key: but here you have five interleaved, which is many times harder in theory… but is it in practice? I’m not so sure, because a lot of classical cryptology tricks apply here too.
Ah, this is fun and already learned something:
I always thought that the British date format was also American MDY
but it is D-M-Y ! Did not know that before.
I do not have time to read everything what is written but do you want a possible answer here or by e-mail?
Davidsch: leaving your possible answer here is probably best.
the configuration of the challenge is unclear to me.
After checking 26to the power of 5 possibilities, it seems not to be like this :
We have 5 columns which represent 5 different mixed alphabets, each 26 letters.
Reading from left to right makes one plain text.
What makes it homophone is that we have 5 different representations of the same 26 letters. What is wrong with this picture?
I assumed that if an alphabet lies between 400 …. 425, the numbers represent consecutive letters. For example if 412=A then 413=B.
However, if the numbers representing the 5 alphabets have numbers that are not sequential (For example 412=A and 422=B) , I really would have known that before because in that case I would have chosen an entirely different approach.
@ Davidsch, each of the 5 different alphabets was randomly mixed, that means you would have to check (26!)^5 that is all possible permutations of the five different alphabets which equals about 10^133 configurations if you want to try them all. You seem to be thinking of shifting (i.e. rotating) the individual alphabets, not really internally mixing them.
thanks. So, you mean that the mixing is done “over the total space of the 5 alphabets” ? But if so, why use the column numbers in distinct rows if they do not refer to a specific position in the alphabet, nor the alphabet itself? That is not logical.
You are right, I read the “randomly mixed” as “within the alphabet the letters are shifted”. And also I assumed the 5 columns represented the 5 separate alphabets, ordered in the distinct 5 columns.
@Davidsch, no I tried to say:
– there are 5 columns representing 5 separate alphabets
– each of the alphabets has an individual randomly generated assignment of letters to the numbers.
Clear. Within each alphabet the assignment of the numbers are random.
It smells like Battista Bellaso, but to be sure: If there is not really an order, what would the “key” as mentioned in the text, generate or indicate ? And what do we know about that/ those keys ?
because…IF there are no stricter rules on the 5 alphabets, or the keys,
the text can be anything and the number of good solutions is huge.
Especially on the short chall7 where the amount of unique letters per columns is 8/11/14/11/8. -> One can choose any 8 letters for the first and last columns and fill in a text
Davidsch: my computer shuffled each key many thousands of times, so I can confirm that they are properly random, i.e. no keywords etc.
The reason I put the challenge forward at all is because I suspect that constraining the letters to a separate alphabet in each of the five columns makes it significantly easier to solve than a ‘pure’ homophonic cipher, i.e. where each homophone instance is chosen entirely at random.
Ok. I will take with me chall.7 and paper and pencil to the beach coming weeks
Can there be said anything on the key on which I assume is a number.
In particular I would like to know if you can reveal if:
* the key is 1 number
* the key is 2 numbers
* more numbers…
Ok now here on holiday a small note, that it is actually much more difficult then when these were normal homophones, because now there is no system at all per column for the 5 alphabets, where normally is some sort of system in chosing the sequences for the homophones, either per vowel only or per each letters a consecutive order, or a specific modulus key per letter. Now we have no clue whatsoever how the key per column looks but also there is no possibility in using frequencies on letters, bigrams or bigger. Assuming that the highest are equal to the plain text highest, proofs to be untrue for these small columns. Which makes it a quessing game. The only thing we can do is use the start and end letter per crypto chall., the VC count must be 1,2 or 3 per horizontal and we could guess the wordlength. Still it comes down to something similar as brute force.
Nick, would you consider revealing one of the plaintexts (maybe not #1) as the deadline for the challenge has now passed?
Narga: I’m more likely to give a super-long additional challenge cipher to see if anyone can solve even that using the (very obvious) constraints to assist the search.
The point of the exercise was to challenge people to see if this was a category of cipher that could be solved at all, given that it seems we have at least one of that general class (and probably several others not yet released) in the Scorpion Ciphers. Have you tried to solve any of them?
Haven’t given the Scorpion Ciphers a try, yet, but had no problem solving my own longer test cipher (~6x the length of your #1) with my code.
However with the given constraints and such a super-long cipher, one would have of course also sufficient statistics for a simple substitution based on letter frequency in the individual alphabets, so don’t put too much money on that 🙂
I believe I’ve solved cipher #1:
THEOBJECTOFMYPROP
OSEDWORKONCYPHERI
SNOTEXACTLYWHATYO
USUPPOSEBUTMYTIME
ISNOWSOENTIRELYOC
CUPIEDTHATIHAVEBE
ENOBLIGEDTOGIVEIT
UPATLEASTFORTHENE
XTTWOORTHREEYEARS
I’ve tried posting about it a few times. Is Askimet eating my comments? I’ll try posting without a link or other commentary so that this comment is more likely to get through to you. 🙂
Louie: fantastic, well done! I can indeed confirm that you have solved my first challenge cipher! I believe that this is the hardest constrained homophonic cipher ever solved, so you are now officially a cryptological star! 🙂 🙂 🙂
I’ll email you separately, because now that you have achieved immortality, I (inevitably) would like to write up how you managed to do it. 🙂
As far as links go, I ask people to remove the first colon and the last dot of any link they post, so that I can reassemble it during moderation. This is annoying, but it sharply reduces the amount of bot-generated comment spam I have to wade through each day.
Congratulations to Louis!
I realize that I had found one letter correctly, so I was on the right track!