It seems as though penetrating public cryptographic analysis of the three Beale Papers (B1, B2, and B3) halted abruptly in 1980 when Jim Gillogly pointed out a problem with B1. If, as he pointed out, you apply to B1 the same dictionary code used for B2 (famously derived from the Declaration of Independence), you get a ciphertext with some distinctive properties:-
SCS?E TFA?G CDOTT UCWOT WTAAI WDBII DTT?W TTAAB BPLAA ABWCT
LTFIF LKILP EAABP WCHOT OAPPP MORAL ANHAA BBCCA CDDEA OSDSF
HNTFT ATPOC ACBCD DLBER IFEBT HIFOE HUUBT TTTTI HPAOA ASATA
ATTOM TAPOA AAROM PJDRA ??TSB COBDA AACPN RBABF DEFGH IIJKL
MMNOH PPAWT ACMOB LSOES SOAVI SPFTA OTBTF THFOA OGHWT ENALC
AASAA TTARD SLTAW GFESA UWAOL TTAHH TTASO TTEAF AASCS TAIFR
CABTO TLHHD TNHWT STEAI EOAAS TWTTS OITSS TAAOP IWCPC WSOTT
IOIES ITTDA TTPIU FSFRF ABPTC COAIT NATTO STSTF ??ATD ATWTA
TTOCW TOMPA TSOTE CATTO TBSOG CWCDR OLITI BHPWA AE?BT STAFA
EWCI? CBOWL TPOAC TEWTA FOAIT HTTTT OSHRI STEOO ECUSC ?RAIH
RLWST RASNI TPCBF AEFTB
Here you can see not only tripled letters (AAA, PPP), quadrupled letters (TTTT) and even quintupled letters (TTTTT), but also (and this is the part that ignited Gillogly’s cryptographic curiosity) the sequence ABFDEFGHIIJKLMMNOHPP. Even if you restrict your view to the DEFGH IIJKL MMNO monotonically increasing sub-sequence in the middle, the chances of that appearing at random would be (he calculates) about one in a million million. Making it even more improbable is the fact that the aberrant “F” near the start has code 195 where code 194 is “C”, and the aberrant “H” near the end has code 301 where code 302 is “O”, which makes it look a great deal as though these were simply encoding slips. And if these were intended to be C and O respectively, the unlikeliness of the sequence vastly increases again.
Yet as far as the multiple letter groups go, we can do some simple probability calculations based on the 1321 characters Gillogly lists for the B2 codebook. From frequency analysis – T 255, A 167, O 145, H 80, I 69, S 62, F 62, P 59, W 59, C 53, B 48, R 41, D 37, E 36, L 35, M 30, U 28, G 19, N19, J 10, K 4, V 2, Y 1, X 1, Q 1, Z 0 – you can see that T, A, and P occur 19.3%, 13.5%, and 4.46% (respectively) of the time in the codebook. So, if the text letters were picked at random (as would pretty much be the case if B2’s codebook was completely the wrong codebook for B1), the chances of these patterns occurring randomly at least once in a 520-character sample would be something like this:-
- prob(TTTTT) = 1 – (1 – 0.193^5)^(520-(5-1)) = 12.9%
- prob(TTTT) = 1 – (1 – 0.193^4)^(520-(4-1)) = 51.2%
- prob(AAA) = 1 – (1 – 0.135^3)^(520-(3-1)) = 72.1%
- prob(PPP) = 1 – (1 – 0.0446^3)^(520-(3-1)) = 4.5%
You would also expect to see a copious amount of TT and AA pairs scattered through the text, which is in fact exactly what we see (13 x TT and 10 x AA, quite apart from the TTTTT, TTTT and AAA listed above).
And therein lies the basic Beale Papers paradox: though the distribution and clustering seem to imply that B2’s codebook was not B1’s codebook, the ‘Gillogly sequence’ seems to imply that the two are linked in some way. So, what’s it to be? Damoclean swords aside, how can we unpick this cryptologic knot?
My observation here is that if there is also some kind of monoalphabetic substitution going on (i.e. in addition to the Declaration of Independence codebook), then it’s quite possible that the Gillogly sequence represents the keyword or keystring used to generate that substitution alphabet. This might well explain the doubled letters within the keystring (i.e. the II MM and PP): if so, we would be looking for a keystring with four doubled letters but where none of the vowels repeat.
ABCDEFGHIIJKLMMNOOPP
Hmmm… there can’t be many English words ending with two adjacent doubled letters: in fact, the only two I can think of are coffee and toffee (please let me know if you can think of any others!) ‘Toffee’ doesn’t sound very promising, so could it be ‘coffee’? The previous word would then need to end with “C” to make a doubled letter… not hugely promising, but perhaps it’s a start!-
ABCDEFGHIIJKLM MNOOPP
xxxxxxxxxxxxxC COFFEE
xxxxxxxxxxxxxT TOFFEE
Alternatively, it might be a three letter word, like “TOO” or “OFF”. Had Eric Sams considered this, doubtless he would have happily constructed all kinds of valid key phrases that fit these constraints, such as:-
ABCDEF GHIIJ KLMMNO OPP
CLUNKY SPEED RABBIT TOO
OK, it’s true that the key phrase to the Beale Papers is not going to be “CLUNKY SPEED RABBIT TOO”, but maybe (just maybe) it’s a step in the right direction. 🙂
Incidentally… the Wikipedia Beale Papers page notes that “In 1940, the famous cryptanalyst, Dr. White of Yale University, came close to solving the Beale ciphers after tracking down the suspected key hidden by Beale in St. Louis—he never spoke of his findings.” Though I did a bit of Internet sleuthing to try to work out who this Dr White was, I didn’t really get anywhere – I don’t think he was the Maurice Seal White (b.1888) who wrote the 1938 book “Secret writing : how to write and solve messages in cipher and code” (which I found listed in Lou Kruh’s bibliography and Worldcat) and who was a Columbia alumnus in 1920 (see p.212 here), but it’s hard to tell. Please let me know if you find out!