While thinking about the Scorpion S1 unsolved cipher in the last few days, it struck me that it seemed to be a special kind of homophonic cipher, one where the homophones are used in rigid groups.

That is: whereas the Zodiac Killer’s Z408 cipher cycled (mostly but not always) between sets of homophones by their appearance, it appears that the Scorpion S5 cipher maker instead rigidly cycled between 16 sets of homophones by column. What’s interesting about both cases is that the use pattern gives solvers extra information beyond that which they would have for a homophonic cipher where each homophone instance was chosen completely at random.

Perhaps there’s already a special name for this: but (for now) what I’m calling them is “constrained homophonic ciphers“, insofar as they are homophonic ciphers but where an additional use pattern constrains the specific way that the homophones are chosen.

The question I immediately wanted to know the answer to was this: can we solve these? And what better way to find this out than by issuing a challenge!

Seven Challenge Ciphers

The seven challenge ciphers are downloadable as a single zip file here, or as seven individual CSV files here:
* #1
* #2
* #3
* #4
* #5
* #6
* #7

How The Ciphers Were Made

Unlike normal challenge ciphers, what I’m giving you here (in line with Kerkhoffs’ Principle) is complete disclosure of the cipher system and even the plaintext language.

The cipher system used here is a homophonic cipher with exactly five possible homophones for each plaintext letter BUT where the homophones are strictly selected according to the column number in which they appear in the ciphertext. Each separate CSV uses its own individual key.

The plaintext language is English: they are straightforward sentences taken from a variety of books, and without any sadistic linguistic tricks (i.e. no “SEPIA AARDVARK” or similar to confuse the issue).

The enciphered files are simple CSV (comma-separated values) text files, arranged in rows of five letters at a time, but encoded as decimal numbers. For example, the first (and the longest) challenge cipher (“test1.csv”) begins as follows:

121,213,310,406,516,
108,200,323,416,513,
112,208,308,409,515,

Here, “121,213,310,406,516,” enciphers plaintext letters #1..#5, “108,200,323,416,513,” enciphers plaintext letters #6..#10, and so forth. The first column is numbered in the range 100..125 (i.e. these belong to the 1st homophonic alphabet), the second column 200..225 (i.e. these belong to the 2nd homophonic alphabet), and so forth.

The start of the message and the end of the message are exactly as you would expect: there is no padding at either end, no embedded key information, just pure ciphertext.

The Rules

Treating this as a massively parallel book search using cloud databases (a) will be treated as cheating, and (b) will spoil it for other people, so please don’t do that. This challenge is purely about finding the limits of cryptanalysis, not about grandstanding with Big Data.

Hence you’ll need to also tell me (broadly) what you did in order to rise to the challenge, so that I can be sure you haven’t solved it through secondary or underhand means.

The Prize

If nobody solves any of the challenge ciphers by the end of 2017, my wallet stays shut.

However, the person (or indeed group) who has the most success decrypting any of these seven challenge ciphers by 31st December 2017 will be the “2017 Cipher Mysteries Cipher Champion“, and will also receive a shockingly generous £10 prize (sent anywhere in the world where PayPal can send money) to spend as they wish.

In the case of multiple entrants solving the same difficulty cipher independently, I’ll award the prize to the first to contact me. In all cases, please leave a comment below.

In all situations, my decision is final, absolute, arbitrary and there is no opportunity for appeal. Just so you know.

PS: any individual (or indeed covert agency) wishing to donate more money to increase the prize fund (i.e. to make a little more cryptanalytic sport of this), please feel free to email me.

Hints and Tips

I suspect that the multiplicity (i.e. the number of different symbols used divided by the length of the ciphertext) will prove to be too high and the ciphertext lengths too short for conventional homophonic decryption programmes, so I expect prospective solvers won’t be able to look to these for any great help.

Similarly, I don’t believe that numerical brute force and/or parallel processing will be sufficient here: all the same, these challenges (if solvable) will probably prove to be things that anyone anywhere can tackle (e.g. through hill-climbing and cleverly exploiting the constraints), not just the NSA, GCHQ or similar with their supercomputers.

For what it’s worth, my best guess right now is that #1 (the longest of the seven ciphertexts) will prove to be solvable… though only just. Even so, I’d be delighted to be proved wrong for any of the others.

Incidentally, I chose the length of the very shortest challenge cipher to broadly match the length of the Scorpion S1 cipher: so even in the (perhaps unlikely) case where all seven of my challenge ciphers get solved, there’ll still be an eighth challenge to direct your clever efforts at. 😉

33 thoughts on “New: seven constrained homophonic challenge ciphers for you to solve…

  1. Don Simpson on June 25, 2017 at 1:16 pm said:

    Nick, I have been following your blogs with interest but as a newbie who is severely challenged by the daily cryptogram could you suggest a list of references to start me learning the more sophisticated methods of cryptography?

  2. Don Simpson: it’s a good question. Books tend to either steer clear completely of the interesting stuff or dive so deeply into it that your eardrums burst. I’ll have a look around, see if there’s something I could recommend.

  3. Ravenhurst on June 26, 2017 at 6:29 pm said:

    @Don

    Friedrich L. Bauer
    Decrypted Secrets
    4th edition

    ISBN-10 3-540-24502-2
    ISBN-13 978-3-540-24502-5

  4. Ravenhurst: while it’s a good book, few will want to pay upwards of £100 for the (latest) 4th edition. Having said that, earlier editions are available via bookfinder.com for rather less, and that’s probably where I guess the best compromise lies. 🙂

  5. Matt on June 27, 2017 at 9:31 am said:

    The Code Book by Simon Singh would be my recommendation as a good starting point to learn.

    Also – I hate to admit this and I’m probably being very dim but I’m not sure I understand the layout of the ciphers.

    Is it

    A
    101
    102
    103
    104
    105

    B
    106
    107
    108 etc

    or

    A
    101
    126

    and so on?

  6. Matt: The Code Book is a comfortable read, but I’m not sure it’s where I’d point someone first.

    The alphabets are interleaved, i.e. ABCDEABCDEABCDE etc.

  7. Hello Nick!
    Can you guide us a little more? Does the text contain long sentences or is it a mixture of short words? I believe to read two or three words of 5 letters in Test 1, am I on the right track or should I give up?
    Best regards
    Ruby

  8. Ruby: all I’m saying is that I tried to be fair, in that I didn’t select passages that were completely artificial, or arbitrary, or mad, or oddly-structured, or random-looking.

    The general problem when trying to crack homophonic ciphertexts is that you have so many degrees of freedom (particularly with symbols that only appear once) that it’s easy to devise texts that genuinely do fit short stretches of it. However, the problems then come when you try to follow that same (narrow) road for the rest of the ciphertext… as per just about every attempted decryption of the unsolved Zodiac Killer Ciphers. :-/

  9. This is nearly impossible, there are millions of possible cipher combinations. The hardest part is that after the first letter, lots of information is obscured. The position of letters in words, most notably.

    All I can think of is to count the most frequent numbers in each cipher and assign to them the most frequent letters in English. Are spaces encoded? That might be a key. Or is it just one string of words without spaces? In that case, I say it’s impossible

    Unless one can write a computer program that generates all solutions using all possible alphabet combinations, and have it filter out those strings which contain real words.

  10. Jarlve on June 28, 2017 at 10:03 am said:

    One could try stacking wordslides and cribs, there is a repeating quadgram in the first cipher: 408 500 113 203.

  11. Jarlve: can you explain how that might work? If it’s any help, I’m happy to host guest posts on code-breaking, rather than squeeze everything into the small margins of comments. 🙂

  12. SirHubert on June 29, 2017 at 5:54 pm said:

    “Each separate CSV uses its own individual key.” Does that Imply that a key of some kind involved in generating each homophonic alphabet?

    There are techniques for solving this type of cipher (as I’m sure you know). It’s the relative shortness of the ciphertext which poses the main challenge…

  13. SirHubert: my PC shuffled each key thousands of times, so there’s no key phrase to search for, if that’s any help.

    And yes, I’d agree that this kind of ciphertext is straightforward to solve with only a single key: but here you have five interleaved, which is many times harder in theory… but is it in practice? I’m not so sure, because a lot of classical cryptology tricks apply here too.

  14. Davidsch on July 6, 2017 at 1:55 pm said:

    Ah, this is fun and already learned something:
    I always thought that the British date format was also American MDY
    but it is D-M-Y ! Did not know that before.

    I do not have time to read everything what is written but do you want a possible answer here or by e-mail?

  15. Davidsch: leaving your possible answer here is probably best.

  16. Davidsch on July 9, 2017 at 10:30 pm said:

    the configuration of the challenge is unclear to me.
    After checking 26to the power of 5 possibilities, it seems not to be like this :
    We have 5 columns which represent 5 different mixed alphabets, each 26 letters.
    Reading from left to right makes one plain text.
    What makes it homophone is that we have 5 different representations of the same 26 letters. What is wrong with this picture?

  17. Davidsch on July 10, 2017 at 9:03 am said:

    I assumed that if an alphabet lies between 400 …. 425, the numbers represent consecutive letters. For example if 412=A then 413=B.

    However, if the numbers representing the 5 alphabets have numbers that are not sequential (For example 412=A and 422=B) , I really would have known that before because in that case I would have chosen an entirely different approach.

  18. Narga on July 10, 2017 at 9:30 am said:

    @ Davidsch, each of the 5 different alphabets was randomly mixed, that means you would have to check (26!)^5 that is all possible permutations of the five different alphabets which equals about 10^133 configurations if you want to try them all. You seem to be thinking of shifting (i.e. rotating) the individual alphabets, not really internally mixing them.

  19. Davidsch on July 10, 2017 at 10:39 am said:

    thanks. So, you mean that the mixing is done “over the total space of the 5 alphabets” ? But if so, why use the column numbers in distinct rows if they do not refer to a specific position in the alphabet, nor the alphabet itself? That is not logical.

    You are right, I read the “randomly mixed” as “within the alphabet the letters are shifted”. And also I assumed the 5 columns represented the 5 separate alphabets, ordered in the distinct 5 columns.

  20. Narga on July 10, 2017 at 12:35 pm said:

    @Davidsch, no I tried to say:
    – there are 5 columns representing 5 separate alphabets
    – each of the alphabets has an individual randomly generated assignment of letters to the numbers.

  21. Davidsch on July 10, 2017 at 2:20 pm said:

    Clear. Within each alphabet the assignment of the numbers are random.

    It smells like Battista Bellaso, but to be sure: If there is not really an order, what would the “key” as mentioned in the text, generate or indicate ? And what do we know about that/ those keys ?

  22. Davidsch on July 10, 2017 at 4:35 pm said:

    because…IF there are no stricter rules on the 5 alphabets, or the keys,
    the text can be anything and the number of good solutions is huge.

    Especially on the short chall7 where the amount of unique letters per columns is 8/11/14/11/8. -> One can choose any 8 letters for the first and last columns and fill in a text

  23. Davidsch: my computer shuffled each key many thousands of times, so I can confirm that they are properly random, i.e. no keywords etc.

    The reason I put the challenge forward at all is because I suspect that constraining the letters to a separate alphabet in each of the five columns makes it significantly easier to solve than a ‘pure’ homophonic cipher, i.e. where each homophone instance is chosen entirely at random.

  24. Davidsch on July 12, 2017 at 12:28 pm said:

    Ok. I will take with me chall.7 and paper and pencil to the beach coming weeks

  25. Davidsch on July 20, 2017 at 1:22 pm said:

    Can there be said anything on the key on which I assume is a number.

    In particular I would like to know if you can reveal if:
    * the key is 1 number
    * the key is 2 numbers
    * more numbers…

  26. Davidsch on July 28, 2017 at 9:03 pm said:

    Ok now here on holiday a small note, that it is actually much more difficult then when these were normal homophones, because now there is no system at all per column for the 5 alphabets, where normally is some sort of system in chosing the sequences for the homophones, either per vowel only or per each letters a consecutive order, or a specific modulus key per letter. Now we have no clue whatsoever how the key per column looks but also there is no possibility in using frequencies on letters, bigrams or bigger. Assuming that the highest are equal to the plain text highest, proofs to be untrue for these small columns. Which makes it a quessing game. The only thing we can do is use the start and end letter per crypto chall., the VC count must be 1,2 or 3 per horizontal and we could guess the wordlength. Still it comes down to something similar as brute force.

  27. Nick, would you consider revealing one of the plaintexts (maybe not #1) as the deadline for the challenge has now passed?

  28. Narga: I’m more likely to give a super-long additional challenge cipher to see if anyone can solve even that using the (very obvious) constraints to assist the search.

    The point of the exercise was to challenge people to see if this was a category of cipher that could be solved at all, given that it seems we have at least one of that general class (and probably several others not yet released) in the Scorpion Ciphers. Have you tried to solve any of them?

  29. Haven’t given the Scorpion Ciphers a try, yet, but had no problem solving my own longer test cipher (~6x the length of your #1) with my code.

    However with the given constraints and such a super-long cipher, one would have of course also sufficient statistics for a simple substitution based on letter frequency in the individual alphabets, so don’t put too much money on that 🙂

  30. Louie Helm on January 6, 2020 at 11:59 pm said:

    I believe I’ve solved cipher #1:

    THEOBJECTOFMYPROP
    OSEDWORKONCYPHERI
    SNOTEXACTLYWHATYO
    USUPPOSEBUTMYTIME
    ISNOWSOENTIRELYOC
    CUPIEDTHATIHAVEBE
    ENOBLIGEDTOGIVEIT
    UPATLEASTFORTHENE
    XTTWOORTHREEYEARS

    I’ve tried posting about it a few times. Is Askimet eating my comments? I’ll try posting without a link or other commentary so that this comment is more likely to get through to you. 🙂

  31. Louie: fantastic, well done! I can indeed confirm that you have solved my first challenge cipher! I believe that this is the hardest constrained homophonic cipher ever solved, so you are now officially a cryptological star! 🙂 🙂 🙂

    I’ll email you separately, because now that you have achieved immortality, I (inevitably) would like to write up how you managed to do it. 🙂

    As far as links go, I ask people to remove the first colon and the last dot of any link they post, so that I can reassemble it during moderation. This is annoying, but it sharply reduces the amount of bot-generated comment spam I have to wade through each day.

  32. Congratulations to Louis!
    I realize that I had found one letter correctly, so I was on the right track!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Post navigation