Following my recent Scorpion Ciphers post, I’ve put up a permanent reference page on the Scorpion Ciphers and have also tried to contact John Walsh about the as-yet-unreleased other ciphers… so we’ll see how that goes.

Since then, I’ve been working a little more with S5, which has 155 unique symbols out of 180 letters. Because repeated symbols in S5 are always multiples of 16 letters apart, it seems likely to me that this ciphertext was constructed from 16 independent alphabets cycled through in strict sequence. My hope was that this regularity might give us a better chance of cracking S5 than if it were a randomly chosen homophonic cipher.

All the same, this was just a guess: so the first thing I did was come up with a way to test this hypothesis, by writing a short C program to encipher 180-long subsections of the Scorpion’s own plaintext using various numbers of sequential alphabets, to see if this would produce roughly 155 unique symbols.

For each number of alphabets (e.g. 2), I tried (notionally) enciphering every 180-long stretch of the Scorpion’s text, and kept a tally of the minimum number of symbols required (e.g. 37), the maximum number of symbols required (e.g. 44), and the average number of symbols required (e.g. 40).

Interestingly, the results weren’t what I expected:-

alphabets = 1, uniques = (19..24) 21
alphabets = 2, uniques = (37..44) 40
alphabets = 3, uniques = (50..61) 55
alphabets = 4, uniques = (60..74) 68
alphabets = 5, uniques = (72..86) 79
alphabets = 6, uniques = (77..97) 87
alphabets = 7, uniques = (88..105) 97
alphabets = 8, uniques = (91..110) 101
alphabets = 9, uniques = (92..116) 106
alphabets = 10, uniques = (104..122) 113
alphabets = 11, uniques = (107..127) 117
alphabets = 12, uniques = (113..136) 122
alphabets = 13, uniques = (113..134) 123
alphabets = 14, uniques = (115..138) 129
alphabets = 15, uniques = (123..146) 132
alphabets = 16, uniques = (120..147) 133
alphabets = 17, uniques = (128..146) 136
alphabets = 18, uniques = (126..151) 137
alphabets = 19, uniques = (128..150) 139
alphabets = 20, uniques = (132..153) 143
alphabets = 21, uniques = (133..159) 144
alphabets = 22, uniques = (131..155) 145
alphabets = 23, uniques = (137..154) 145
alphabets = 24, uniques = (137..157) 147
alphabets = 25, uniques = (139..160) 149
alphabets = 26, uniques = (141..158) 149
alphabets = 27, uniques = (143..163) 152
alphabets = 28, uniques = (143..164) 152
alphabets = 29, uniques = (139..164) 153
alphabets = 30, uniques = (145..164) 154
alphabets = 31, uniques = (143..164) 153
alphabets = 32, uniques = (146..167) 156

That is to say, even though S5 looks as though it is strictly cycling through 16 ciphers, this isn’t consistent with the stats of the Scorpion’s other plaintext (because that is so verbose and repetitive that it would require on average 32 alphabets to typically yield 155 symbols).

What I think this is implying is either (a) that the Scorpion’s plaintext is significantly less repetitive than the text of his/her messages, or (b) that the cipher system the Scorpion used also employs an extra layer of compression (e.g. a nomenclatura, using extra tokens for common words such as [THE] and [AND], or even common syllable pairs).

I don’t know… I’ll have to have a further think about this, it isn’t at all obvious what’s going on here.

Update: having scratched my head about this for a few more hours, I don’t feel comfortable with the suggestion that some kind of nomenclatura is involved. Rather, what I suspect now is that what we’re looking at here is not a 16 x 26-token set of ciphers (i.e. A-Z) but a 16 x 36-token set of ciphers (i.e. A-Z plus 0-9), coupled with a slightly less verbose plaintext. Hence my very rough (and admittedly as yet unmodelled) estimate is that roughly 25-35 of the tokens in the plaintext will turn out to be digits.

Unfortunately, I also think that this may have left the text undecryptable, unless there is some additional kind of meta-consistency between shapes across the 16 alphabets (e.g. if all the circle-plus-upright-cross shapes encode the same underlying plaintext token). Oh well!

### 6 thoughts on “Analysis of Scorpion Cipher S5…”

1. Tiago Rodrigues on June 1, 2014 at 5:47 pm said:

Nick,

can’t say I’ve studied the Scorpion ciphers in much detail but I seriously doubt they are supposed to be homophonic ciphers at all.
Take S1, for instance, you have 15 characters that appear more than once (one of them 4 times, all of the others only 2) – kind of reminds that memory game where you have to find the position of the repeating images.
If I had the time to analyse it, I would consider starting by following the trail of the 32 symbols that thus appear more than once, and check for a possible grille or coordinate system against the accompanying letters.

Regards

2. Tiago: S1 was the first cipher the Scorpion sent, and uses (I think) a rough cycle of five alphabets in a rather slapdash way as compared with the much more rigidly controlled S5, which runs (I believe) off a tight sequence of 16 alphabets.

And if S5 isn’t a homophonic cipher, then what kind of a cipher could it be? Bear in mind that it has an extremely strong 16-long property (every single shape repeat falls at multiples-of-16 distances) that I don’t think fits many other types of cipher… but I could be wrong.

3. Tiago Rodrigues on June 2, 2014 at 1:51 pm said:

Nick, I’ve read your posts on the Scorpion ciphers with due attention and I do stand by my statement.
Both S1 and S5 do not seem to be homophonic and/or polialfabetic. I should perhaps explain better, S1 does seem to be composed of 3 separate alphabets (quares, circles, lines/letters) which you can easily extend to about 22 letters by analysing the existing patterns and inferring the remaining unused symbols.
What I mean is that the multiple alphabets used are a ruse to mark positions in a coordinate system. Both S1 and S5 seem to follow this same pattern.
Either way, since I’ve spent some more time looking into this I might actually do some more analysis and will let you know of the results.

Regards

4. ponky on June 2, 2014 at 1:59 pm said:

Looking at S5, there are symbols which fit in to similar groups. What’s interesting is that a lot of these groups seem to express 4-bit properties, in that they have 4 optional glyph-parts. That’s interesting because it means there are 16 possibilities for each of these glyphs, and based on the repetition distances it seems there are also 16 alphabets.

The groups I can see are:

* Square with + shaped quadrants dotted
* Square with X divider, and quadrants dotted
* Square with + shaped quadrant filled
* Circle with X divider, and quadrants dotted
* Circle with + shaped quadrants filled
* Circle with X divider, and quadrants filled

The potential groups I can see are:

* Square with + divider and quadrants dotted: there is only one example and it is not in 16-alignment with the same glyph without the quadrant dividers, suggesting it’s a separate symbol-group.
* The triangle-in-a-square, with any of three quadrants filled and/or the centre quadrant dotted: It’s unclear how this would work if the centre quadrant was dotted and filled at the same time.
* A rounded W shape with sides either dotted or filled: Unclear how dotted and filled sides should look.
* Something that looks like a flag/music note/diagonal line.
* Something that looks like the zodiac symbol, with outer horizontal/veritcal lines, and left/right inner half filled.

Note that some entires in these groups will match the same glyph as another group, e.g. a filled circle doesn’t express whether it has a + or a X divider. This doesn’t seem to appear in the S5 text.

When you look at the 16-aligned version, most of the obvious symbol groups do not fall on the same alignment. This indicates that these groups have one entry in each alphabet, although I haven’t bothered to worked out what the probability of this is. This fits with Scorpion’s comments about underlying patterns. There are three ways this could play out:

1. Symbols from the same group represent the same letter throughout. This would be a serious weakness in the cipher, and it seems odd to go to the effort of making so many alphabets only to mostly merge them. However, if it is the case, it should make the cipher considerably easier to break.

2. Symbols from the same group represent a pattern of letters across alphabets. E.g., if it means C in the first alphabet, it could mean D in the second, E in the third, etc. To me, this seems the most likely, although the question then becomes “what is the pattern?”

3. There is no pattern, ‘Scorpion’ just used this trick to help generate unique symbols and fed one in to each alphabet before assigning letters to them at random. If this is the case, there is no underlying pattern to the code and all of this is not very useful. Let’s hope it’s not this.

I did all this by eyeballing the cipher on my lunch break, so I’ve probably made a few mistakes.

5. thomas spande on June 2, 2014 at 8:28 pm said:

Dear all, Might not some of the scorpion cyphers be signal flags, either international maritime or various naval flags or mixtures of both. We can drop back to Horatio Nelson and then deal with the Royal Navy in general and enciphered versions of those. The latter would appear sort of ambiguous as a flag for 5 is swapped with one for four. Anyway the square flags were letters (A->Z) and the pennants were numbers or combinations of letters like “sp” for “speed”; “su” for “subdivision” and several flags particularly pennants could be flown atop one another.

.Line 1 of S5 has an “H” for symbol 4; “I” for line 5, symbol 3, and “k” for line 1 symbol 4 (although H and K might be reversed as the flags have white and yellow as the lighter colors). Signal flags exist for every major navy in the world, including Germany and Russia. Some of the symbols, like line 3, symbol 4; line line 3, symbol 6; line 9, symbol 10, and line 11,symbol 3 may represent semaphore positions i.e. the arms of the semaphore signaller. There are even signal flags for the north and south in the American Civil War but let’s not go there at the moment! The periodicity that Nick notes might be one navy code followed by another. Some symbols remain cryptic but maybe we are getting an occasional clue with a flag pole here and there? Cheers, Tom.

6. Knox on June 2, 2014 at 8:52 pm said:

For a first guess of S5, a few symbols that are at a distance of 8 or 16 from each other may represent the same plaintext letter, if it is a cipher. Or the keylength might be 5 or 10.