Seeing as you’re here, I’ll let you into a sort-of secret: actually, I’m not the best code-breaker on the planet. All the same, I think (for what it’s worth) I’m an extremely good historical logician with a strong ability to work in a sustained manner with tricky, uncertain evidence: and am extremely cipher- and stats-literate.
So when I’m taking on a thing like the Beale Ciphers, my primary aim is to understand the practical and historical logic of what happened, and to use that to reduce the dimensions and degree of the code-breaking ‘space’ to something that is more practically tractable. But in all honesty, that’s far closer to an explicitly history-focused process than to a directly cryptology-focused process: someone else who is fundamentally a code-breaker would almost certainly be looking more for cryptanalytical results as a starting point.
But that’s OK: roughly paraphrasing what Euclid said to Ptolemy I Soter I, there is no royal road to this kind of knowledge – every individual must travel it (and learn it) for themselves the hard way.
Anyway, I’ve been looking again at the (33 years later, I’d say “infamous” is very nearly the right word to describe them) Gillogly sequences in Beale Cipher B1.
Firstly, here are the strings that Jim Gillogly found all those years ago: by reconstructing the sequence, I can say with some certainty that what Jim did to get these was to use the first letters of the Declaration of Independence exactly as quoted by Ward as a cipher letter dictionary. And here is what you get (with indices outside this DoI’s 1332-word range printed as ‘?’)
scs?etfa?gcdottucwotwtaaiwdbiidtt?wttaabbplaaabwct
ltfiflkilpeaabpwchotoapppmoralanhaabbccacddeaosdsf
hntftatpocacbcddlberifebthifoehuubtttttihpaoaasata
attomtapoaaarompjdra??tsbcobdaaacpnrbabfdefghiijkl
mmnohppawtacmoblsoessoavispftaotbtfthfoaoghwtenalc
aasaattardsltawgfesauwaolttahhttasotteafaascstaifr
cabtotlhhdtnhwtsteaieoaastwttsoitsstaaopiwcpcwsott
ioiesittdattpiufsfrfabptccoaitnattoststf??atdatwta
ttocwtompatsotecattotbsogcwcdrolitibhpwaae?btstafa
ewci?cbowltpoactewtafoaithttttoshristeooecusc?raih
rlwstrasnitpcbfaeftt
Today’s interesting observation is that if you instead use the modified index numbers that are required to transform Ward’s DoI into the letters that reproduce the B2 plaintext (i.e. adjust for the numbering gaps etc), the output is similar but more coherent (i.e. even more improbable than before):-
sbs?etfa?gcdottucwotwtaaisdbiidtt?wttbaabadaaabbcd
effiflkigpeamnpwchotoallpmotamanhabbbccccddeaosdst
hntftatpocacbcddebetpfebthiffehuubtjtttihpaoaasata
attomnmpoaaarbopjdta??tsbcobdafacpnrbabcdefghiijkl
mmnohppawtaombblsoesaoavispctaolbtflhfoahghwtenalc
assaastatdsltawgfeaauwaoattwhhttaaoetsafaasbstcifr
cabtotlhbdtnhwtstehieoaastwttsoftastaaosiwcpcwsotl
inieeittdattpiufaerfabptccooidnattoatstf??atmatwnw
ttocwtotpatsotebatrohbtogcwcdrolitiahlwaas?btstafa
ewci?ctowltpoactewtafoaiwhttttothrisoeohacuac?paih
rmsstrasnitpctfawftt
What makes this so interesting to my eyes is that quite a few formerly diffuse features kind of ‘come into focus’:-
Before: aabbplaaabwctltfif
After: baabadaaabbcdeffif
Before: abfdefghiijklmmnohpp
After : abcdefghiijklmmnohpp
Before: ttttt
After : tjttt
Before: acbcddlbe
After : acbcddebe
The immediate thing to notice is that “abcdefghiijklmmno” is (I think) more than a thousand times more improbable than “defghiijklmmno”, which itself already had a probability of occurring of less than one in a million million.
The second thing to notice is that the probability of ttttt occurring (based solely on the letter frequency distribution) was about 12.9%, while the probability of tttt occurring is 51.2%: so the fact that the only occurrence of ttttt disappears from one to the other is also a strong indication that we’re going in the right direction.
All the same, another thing to notice is that because T, A and P are all high probability initial letters in the DoI code book text (19.3%, 13.5%, and 4.46% respectively), we would expect to see quite a lot of TT, AA, and PP pairs in the output if the codebook was somehow misaligned with the index stream. And we still do… so it’s also very likely (from that alone) that dictionary mismatches or construction errors or cipher dictionary errors continue to persist.
This isn’t a solution, it’s just an observation standing on on Jim Gillogly’s shoulders. I don’t fully know what it all meams just yet… but I suspect that it will turn out to mean that broadly the same letter numbering used in B2 was used for B1 as well, rather than Ward’s DoI text. Your mileage may vary! 😉 To me, the Gillogly strings tell a complicated, multi-layer story… it’s just that we can’t read it all yet as closely as we would like…
Excellent observation Nick.
http://www.und.edu/org/crypto/crypto/general.crypt.info/beale/more.beale/notes
Something similar here, I think?