Cipher Mysteries readers in the US may well have watched Brad Meltzer’s recent “Decoded” episode on the Declaration of Independence. Though you might well think that the description listed below doesn’t sound particularly promising…

The Declaration of Independence is the founding document of American Democracy. Could it contain hidden messages from our nation’s forefathers intended to be discovered years later? Buddy, Mac and Scott travel across America to try and uncover the mysteries behind our nation’s most prized document.

…it turns out that this episode was in fact largely about the Beale Papers, which (in my opinion, at least) is a proper cipher mystery. I’ve blogged about these a fair few times, such as here: summing up, I conclude that the statistical improbability of the Gillogly strings strongly implies that these are real ciphers (not hoaxes); that they were enciphered using a two-stage combo of codebook and monoalphabetic substitution; and that the Gillogly strings are in fact no more than the keyphrase somehow falling through the system as a set of ABCDE…-style indices.

And just for all those armchair treasure hunters out there eager to crack B1 and B3 for themselves, my predictions are (a) that the B1 key string will turn out to be painfully close to “THOMASJEFFERSONBEALE”, and (b) that though B1 (and probably B3) also used the Declaration of Independence, it had its own slightly different set of counting mistakes as compared to B2. As normal, 15% of the bounty should cover my fee, thanks. 🙂

All of which means that when the Beale Papers finally do get cracked, Jim Gillogly will probably kick himself into the next state for missing what, to a supersmart codebreaker such as him, should be utterly obvious. Unless it’s him that ultimately gets to crack it? We shall see!

Anyway, the nice thing about Brad Meltzer’s show is that it has hugely stimulated interest in the Beale Papers, even creating its own mini-traffic-spike in Google Trends. I’m guessing the linking that’s going on is happening in treasure hunter mailing lists, but to be honest there’s not a lot out there worth reading on the subject. People are finally realising that stories linking the Beale Papers to (for example) famous pirate / privateer Jean Lafitte [Jean Laffite] are probably outright fakes. As with the Voynich Manuscript, all the properly good evidence is embedded right in the text itself: it’s everything else surrounding it that is the hoax!

Some more thoughts on the curious “key” sequence in the Beale Papers

Back in 1980, Jim Gillogly applied the Declaration of Independence codebook for the second Beale Paper (“B2”) to the first Beale Paper (“B1”), and discovered a very unlikely sequence in the resulting text: ABFDEFGHIIJKLMMNOHPP. The chance of the middle section alone (“DEFGHIIJKLMMNO”) occurring at random is about one in a million million, and what is even spookier is that the two aberrant letters in the longer sequence (“F” near the beginning, and “H” near the end) are one entry off from correct letters in the codebook (195 = “F” while 194 = “C”, and 301 = “H” while 302 = “O”).

Gillogly attributed these to encoding slips: but given that I’m wondering whether this string is perhaps a code-sequence of some sort, could it be that the encoder used a slightly different transcription of the Declaration of Independence from the one he/she used for B2? This would yield systematic single-number shifts: so let’s look again at the key-sequence and the adjacent letters in the B2 codebook:

112 T R G A I
18  P B W H C
147 T A O T A
436 L B A P U
195 C F L A T  <-- Gillogly's first apparently offset code
320 I D O T E
37  A E S T W
122 P F S T W
113 R G A I A
6   O H E I B
140 I I T R O  <-- this code might possibly be offset too?!
8   E I B N F
120 T J P F T
305 P K O G B
42  T L O N A
58  O M R R T
461 H M H H D
44  O N A O N
106 P O H T T
301 T H O T P  <-- Gillogly's second apparently offset code
13  O P T D T
408 O P U T P
680 C A U B O
93  C W C U R

Today’s observation, then, is that if the errors in the Gillogly key sequence arose from having used a slightly different codebook transcription of the Declaration of Independence and that the key string should have been ABCDEFGHIIJKLMMNOOPP (as seems to have been intended), then we have two definite (but possibly even three) places where the B1 codebook transcription may have slipped out of registration with the B2 codebook transcription: the code used for the first “I” (141) could equally well have been 140, because that also codes for “I”.

Yet because the sequence is long enough to contain codes that seem correct either side of these errors, we have the possibility of determining the bounds of those stretches in the B1 transcription where the variations (in this scenario) would have occurred. Specifically:-

122 P F S T W
 ?? -1
140 I I T R O
 ?? +1
147 T A O T A
147 T A O T A
 ?? -1
195 C F L A T
 ?? +1
 ?? +1
301 T H O T P
 ?? -1
305 P K O G B

So, if this scenario is correct, it would imply that (relative to the B2 codebook) the B1 codebook transcription dropped a character somewhere between #147 and #195, gained two somewhere between #195 and #301, and then lost another one between #301 and #305. There’s also the possibility that a character was dropped between #122 and #140 and then regained between #140 and #147… not very likely, but worth keeping in mind.

Between #147 and #195, the B1 code usage table looks like this (20 instances):-

148
150 150 150 150 – 154
160 – 162
170 – 172 – 176 176
181 – 184 – 189
191 – 193 – 194 194 194

Between #195 and #301, the B1 code usage table looks like this (64 instances):-

200 200 – 201 201 – 202 – 203 – 206 – 207 – 208 208
210 – 211 211 212 212 – 213 213 – 214 – 216 216 216 216 216 216 216 – 218 218 – 219 219 219 219
221 221 –  224 – 225 – 227
230 230 – 231 – 232 232 – 233 – 234 234 234 – 236
242 – 246 – 247
251
261 – 263 – 264
275 275
280 280 – 283 283 – 284 284 – 286
290 – 294

So, this proposed mechanism would offset up to 84 codes from B1, which may be sufficiently disruptive to have caused B1 to appear undecodable to cryptological luminaries such as Jim Gillogly. It is also entirely possible that (just as with the B2 codebook) there are other paired insertions and deletions to contend with here.

There’s an interesting observation here that many of the transcription errors in the B2 codebook fell close to 10-character (line) boundaries: if this is also the case for some of these (putative) B1 codebook transcription errors, then we should be able to reduce the number of possible variations to check.

It seems as though penetrating public cryptographic analysis of the three Beale Papers (B1, B2, and B3) halted abruptly in 1980 when Jim Gillogly pointed out a problem with B1. If, as he pointed out, you apply to B1 the same dictionary code used for B2 (famously derived from the Declaration of Independence), you get a ciphertext with some distinctive properties:- 

SCS?E TFA?G CDOTT UCWOT WTAAI WDBII DTT?W TTAAB BPLAA ABWCT
LTFIF LKILP EAABP WCHOT OAPPP MORAL ANHAA BBCCA CDDEA OSDSF
HNTFT ATPOC ACBCD DLBER IFEBT HIFOE HUUBT TTTTI HPAOA ASATA
ATTOM TAPOA AAROM PJDRA ??TSB COBDA AACPN RBABF DEFGH IIJKL
MMNOH PPAWT ACMOB LSOES SOAVI SPFTA OTBTF THFOA OGHWT ENALC
AASAA TTARD SLTAW GFESA UWAOL TTAHH TTASO TTEAF AASCS TAIFR
CABTO TLHHD TNHWT STEAI EOAAS TWTTS OITSS TAAOP IWCPC WSOTT
IOIES ITTDA TTPIU FSFRF ABPTC COAIT NATTO STSTF ??ATD ATWTA
TTOCW TOMPA TSOTE CATTO TBSOG CWCDR OLITI BHPWA AE?BT STAFA
EWCI? CBOWL TPOAC TEWTA FOAIT HTTTT OSHRI STEOO ECUSC ?RAIH
RLWST RASNI TPCBF AEFTB

Here you can see not only tripled letters (AAA, PPP), quadrupled letters (TTTT) and even quintupled letters (TTTTT), but also (and this is the part that ignited Gillogly’s cryptographic curiosity) the sequence ABFDEFGHIIJKLMMNOHPP. Even if you restrict your view to the DEFGH IIJKL MMNO monotonically increasing sub-sequence in the middle, the chances of that appearing at random would be (he calculates) about one in a million million. Making it even more improbable is the fact that the aberrant “F” near the start has code 195 where code 194 is “C”, and the aberrant “H” near the end has code 301 where code 302 is “O”, which makes it look a great deal as though these were simply encoding slips. And if these were intended to be C and O respectively, the unlikeliness of the sequence vastly increases again. 

Yet as far as the multiple letter groups go, we can do some simple probability calculations based on the 1321 characters Gillogly lists for the B2 codebook. From frequency analysis – T 255, A 167, O 145, H 80, I 69, S 62, F 62, P 59, W 59, C 53, B 48, R 41, D 37, E 36, L 35, M 30, U 28, G 19, N19, J 10, K 4, V 2, Y 1, X 1, Q 1, Z 0 – you can see that T, A, and P occur 19.3%, 13.5%, and 4.46% (respectively) of the time in the codebook. So, if the text letters were picked at random (as would pretty much be the case if B2’s codebook was completely the wrong codebook for B1), the chances of these patterns occurring randomly at least once in a 520-character sample would be something like this:- 

  • prob(TTTTT) = 1 – (1 – 0.193^5)^(520-(5-1)) = 12.9%
  • prob(TTTT) = 1 – (1 – 0.193^4)^(520-(4-1)) = 51.2%
  • prob(AAA) = 1 – (1 – 0.135^3)^(520-(3-1)) = 72.1%
  • prob(PPP) = 1 – (1 – 0.0446^3)^(520-(3-1)) = 4.5%

You would also expect to see a copious amount of TT and AA pairs scattered through the text, which is in fact exactly what we see (13 x TT and 10 x AA, quite apart from the TTTTT, TTTT and AAA listed above). 

And therein lies the basic Beale Papers paradox: though the distribution and clustering seem to imply that B2’s codebook was not B1’s codebook, the ‘Gillogly sequence’ seems to imply that the two are linked in some way. So, what’s it to be? Damoclean swords aside, how can we unpick this cryptologic knot? 

My observation here is that if there is also some kind of monoalphabetic substitution going on (i.e. in addition to the Declaration of Independence codebook), then it’s quite possible that the Gillogly sequence represents the keyword or keystring used to generate that substitution alphabet. This might well explain the doubled letters within the keystring (i.e. the II MM and PP): if so, we would be looking for a keystring with four doubled letters but where none of the vowels repeat. 

ABCDEFGHIIJKLMMNOOPP 

Hmmm… there can’t be many English words ending with two adjacent doubled letters: in fact, the only two I can think of are coffee and toffee (please let me know if you can think of any others!) ‘Toffee’ doesn’t sound very promising, so could it be ‘coffee’? The previous word would then need to end with “C” to make a doubled letter… not hugely promising, but perhaps it’s a start!- 

ABCDEFGHIIJKLM MNOOPP
xxxxxxxxxxxxxC COFFEE
xxxxxxxxxxxxxT TOFFEE

Alternatively, it might be a three letter word, like “TOO” or “OFF”. Had Eric Sams considered this, doubtless he would have happily constructed all kinds of valid key phrases that fit these constraints, such as:-

ABCDEF GHIIJ KLMMNO OPP
CLUNKY SPEED RABBIT TOO

OK, it’s true that the key phrase to the Beale Papers is not going to be “CLUNKY SPEED RABBIT TOO”, but maybe (just maybe) it’s a step in the right direction. 🙂 

Incidentally… the Wikipedia Beale Papers page notes that “In 1940, the famous cryptanalyst, Dr. White of Yale University, came close to solving the Beale ciphers after tracking down the suspected key hidden by Beale in St. Louis—he never spoke of his findings.” Though I did a bit of Internet sleuthing to try to work out who this Dr White was, I didn’t really get anywhere – I don’t think he was the Maurice Seal White (b.1888) who wrote the 1938 book “Secret writing : how to write and solve messages in cipher and code” (which I found listed in Lou Kruh’s bibliography and Worldcat) and who was a Columbia alumnus in 1920 (see p.212 here), but it’s hard to tell. Please let me know if you find out!