Previous posts here have established (I believe) that the WW2 Pigeon Cipher was almost certainly encrypted using the British Typex cipher machine. So I think it would be a good idea to look at this message from a Typex code-breaker’s point of view.
While Kelly Chang’s (2012) master’s project on the cryptanalysis of Typex is a very useful resource here, I think it’s fair to say that she confines her efforts to purely numerical, permutational attacks. But because she doesn’t try to peer inside an actual ciphertext, I think it’s also fair to say that she doesn’t really look at Typex from a practical code-breaker’s perspective.
So, let’s get to it: let’s (temporarily) close our mathematical eyes, and instead try to look at a Typex message (the WW2 pigeon cipher) through our code-breaking eyes.
The Typex Keyboard
Whereas Enigma was just 26 plain letters A-to-Z (no numbers, no spaces, no umlauts, and not even a special Swastika symbol), Typex had two modes: Letter Mode and Figure Mode. And so the Typex keyboard (below image from Crypto Museum, or you can play with a real-looking one at Virtual Typex) encodes lots of letters in slightly roundabout ways (akin to escape code sequences).
The most notable mappings in Typex’s (default) Letter Mode are:
- X –> Space
- V –> Switch to Figure Mode
- Z –> Switch to Letter Mode
In Typex’s Figure Mode, the top row maps to numbers (QWERTYUIOP —> 1234567890), the second row (largely) maps to punctuation symbols, while the special Letter Mode meta-letters (X/V/Z) maps to G/C/D.
So, to encipher “X” on a Typex keyboard, you’d need to switch into Figure Mode (“V”), press the Figure Mode version of the letter (“G”) and then switch back into Letter Mode (“Z”), i.e. “VGZ”.
Putting this all together, you can see that before sending the classic test sequence “The Quick Brown Fox Jumps Over The Lazy Dog” via Typex, you’d need to “escape” the letters to the Typex keyboard mapping, i.e.
THEXQUICKXBROWNXFOVGZXJUMPSXOVCZERXTHEXLAVDZYXDOG
Here, I’ve highlighted the three escape sequences (for “X”, “V”, and “Z” respectively): similarly, 1234567890 would need to be Typex-escaped as “VQWERTYUIOPZ” before transmission.
Was Typex’s keyboard a strength or a weakness? Certainly, it was more sophisticated, and gave more a concise, bureaucratic feel to messages (“£2/3/6” would have been vastly longer for Enigma). But at the same time, the added expense and physical complexity (the number of Typex machines built was only ever a fraction of the number of Enigma machines in use) seems fairly unwise to me.
Moreover, Typex’s keyboard’s escape sequences significantly modified the way technical language was transmitted. Even though shorter messages are harder to crack than longer messages, I can’t help but wonder whether Typex’s escape sequences might have added crypto weaknesses.
Typex “X”
Any enciphering system that enciphered spaces as X would instantly make X the most common letter in (escaped) plaintexts. So it should be clear that Typex’s letter “X” (which enciphers SPACE) was one possible weakness.
Moreover, right from the earliest part of the war, German codebreakers noted that the first three letters in a new class of intercepted messages were never “A”, “I”, and “R” (respectively), and the last letter was almost never “X”. From this they deduced (correctly) that:
- Messages were being sent using an Enigma-style rotor cipher machine (where letters never map to themselves)
- The sender was almost certainly the British Air Force (“AIR”)
- The last letter was probably using X as a padding character
Even if Typex is (largely) randomising the output letters (via permutation and stepping), we still know that plaintext “X” can never be enciphered as ciphertext “X”. Can we use this to look inside the ciphertext?
If we discard the (almost certainly disguised) rotor setting AOAKN at the start and end of the pigeon cipher message, we get the following:
HVPKD FNFJW YIDDC
RQXSR DJHFP GOVFN MIAPX
PABUZ WYYNP CMPNW HJRZH
NLXKG MEMKK ONOIB AKEEQ
UAOTA RBQRH DJOFM TPZEH
LKXGH RGGHT JRZCQ FNKTQ
KLDTS GQIRU
For this 25 x 5 = 125-character ciphertext, a completely random letter mapping would imply an average instance count of (125/26) = 4.8 instances. In fact, the instance counts of the letters (in decreasing count order) are:
H K R N P D F G Q A J M O T E I X Z B C L U W Y S V
8 8 8 7 7 6 6 6 6 5 5 5 5 5 4 4 4 4 3 3 3 3 3 3 2 2
Even if X is the most common letter in the plaintext, the amount of enciphered text would need to be very long (I’d guess 20+ times longer or more) before Typex (escaped space) X’s higher frequency would show up as a measurable dip in the (Typex ciphertext) X’s statistics.
X: ----- ----- ----- --X-- ----- ----- ----X ----- ----- ----- ----- --X-- ----- ----- ----- ----- ----- ----- ----- --X-- ----- ----- ----- ----- -----
Sadly, because of the short length of the ciphertext, the only thing to note is that the third and fifth lines have no X’s in, which we’ll return to in the next section.
Typex “Q”
From the preceding table, we can see that Q appears six times in the ciphertext. Even though Q is a relatively rare letter in English (hence 10 points in Scrabble), there are a number of different ways that Q can practically appear in an enciphered Typex messages:
- As the letter Q in text (in Letter Mode)
- As the digit 1 (in Figure Mode)
- As part of a five-letter QQQQQ separator block (these appeared in the middle of Typex messages, and were used to help conceal messages starts e.g. coded addressees)
- As a null (Typex operators were, as part of the security protocol, expected to insert a random character every few words)
- As part of a Q-code
Even though Q-codes were originally used for shipping transmissions, their use quickly spread through the various armed services. A few years ago, I found a Combined Operating Signals handbook in the Royal Signals Museum archives. Its first page looked like this:
But though it is entirely plausible that a WW2-era message might include Q-codes such as QPZ (“Yes”) or QQZ (“No”), my understanding is that Q-codes were far more for radio operators than for cipher machine operators. Hence I’m not genuinely expecting to find any Q-codes in the plaintext here.
I’ve previously posted about QQQQQ here, but the short version is that if we look at the six instances of Q that appear in the pigeon cipher message, they appear to cluster in the bottom half of the message:
----- ----- ----- -Q--- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----Q ----- --Q-- ----- ----- ----- ----- ----Q ----Q ----- -Q---
Of course, this might just be a sign that randomness is doing its random thing here. But there’s a pretty good chance that the lack of Q’s in the top half implies that the top half of the plaintext has more Qs than normal.
Why might that be? The two most likely reasons would be (a) the presence of a QQQQQ section divider block (say, on the “PABUZ WYYNP…” line), and (b) the presence of number sequences (because in Figure Mode, Q enciphers the digit “1”). And because of Benford’s Law, we might reasonably expect “1” to appear more often than other digits, so this perhaps isn’t quite as arbitrary as you might at first think.
I also wonder the lack of Xs on the third line might be an indication that the block of five letters immediately before the (putative) QQQQQ ends with a block of Xs, e.g. –XXX QQQQQ. It’s certainly possible…
Other Letters
If we look at the five Ts in the ciphertext, these too cluster at the bottom in a slightly unusual way:
T: ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ---T- ----- ----- T---- ----- ----T ----- ---T- ---T- -----
And the two Vs in the ciphertext are also (perhaps) notable for both being at the top:
V: -V--- ----- ----- ----- ----- --V-- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
Note also that even though the instance counts of V and Z in any given message will (almost certainly) be identical (because Figure Shift will be followed by a matching Letter Shift back again), these are small enough that they won’t show up in the instance stats. But the small number of Vs in the ciphertext might possibly be a (very weak) indication that the bottom half of the text has a lot of Figure Shifting going on.
But really: are these statistically significant results, or is it merely the Randomness Fairy laughing into her hand? A researcher with the persistence of Dave Oranchak would randomise millions of cases and see how often these conditions recur: but with such a small ciphertext, it’s hard to be sure. For now, though, it’s just a set of interesting observations. 🙂