To decipher the sequence of numbers that make up the second Beale Cipher (‘B2’), you use them to index into the words a slightly-mucked-around version of the Declaration of Independence (A.K.A. a “book cipher” / “dictionary cipher”): the sequence of initial letters this produces yields the decrypted plaintext. Errm… except that this isn’t the whole story: thanks to the Committee of Five’s inexplicable omission of a right to bear xylophones, yoyos, or zebras, the B2 cipher maker also had to improvise a second “rare letter cipher” to encipher rare word-initial letters such as x- and y-. (But that’s a post for another day.)

For book ciphers that literally use dictionaries as their code book, this wouldn’t be a problem (because they necessarily go all the way from aardvarks to zymurgy). Of course, given that the letters of the alphabet appear there in strictly ascending order, using an actual dictionary would probably be a bit dumb. Hence people use book ciphers instead, preferably ones with zebras playing xylophones. 😉

So: strictly speaking, then, Beale Cipher B2 doesn’t employ a pure book cipher, but instead uses a slightly hybridized one, where letters absent from the DoI get enciphered by some (currently) unknown means. So here are some numbers to introduce how the book cipher part of the B2 cipher system works.

B2’s Mapping Statistics

I haven’t seen B2’s letter mapping statistics anywhere on the Internet, so I thought this would be a good place to start (note q and z are not used in B2, so do not appear):

* a [43/15,av=2.9,34.9%]: 24[4] 36[2] 28[5] 147[2] 45[1] 81[4] 98[3] 51[4] 284[1] 150[6] 27[2] 230[4] 83[2] 25[2] 152[1]
* b [11/7,av=1.6,63.6%]: 308[1] 9[1] 77[4] 18[2] 134[1] 485[1] 194[1]
* c [19/7,av=2.7,36.8%]: 84[7] 65[2] 92[2] 4[3] 94[1] 200[2] 21[2]
* d [49/11,av=4.5,22.4%]: 52[10] 15[8] 211[3] 118[4] 63[11] 252[1] 135[2] 246[3] 320[5] 406[1] 582[1]
* e [103/14,av=7.4,13.6%]: 37[13] 49[6] 7[15] 79[4] 85[11] 138[15] 191[7] 620[2] 486[3] 511[6] 548[2] 603[4] 575[2] 33[13]
* f [21/8,av=2.6,38.1%]: 196[4] 160[4] 122[6] 273[1] 131[3] 360[1] 666[1] 11[1]
* g [15/4,av=3.8,26.7%]: 270[3] 48[6] 113[5] 133[1]
* h [37/8,av=4.6,21.6%]: 73[8] 107[5] 394[1] 6[4] 20[9] 301[2] 205[7] 466[1]
* i [55/12,av=4.6,21.8%]: 115[5] 647[1] 140[15] 2[7] 8[12] 154[4] 314[2] 159[1] 67[4] 185[1] 241[2] 370[1]
* j [2/2,av=1.0,100.0%]: 120[1] 581[1]
* k [1/1,av=1.0,100.0%]: 305[1]
* l [32/10,av=3.2,31.3%]: 42[5] 101[6] 102[7] 234[1] 400[4] 158[3] 197[1] 420[3] 177[1] 405[1]
* m [6/4,av=1.5,66.7%]: 58[1] 82[1] 117[2] 208[2]
* n [69/8,av=8.6,11.6%]: 47[13] 10[13] 287[8] 353[8] 607[2] 540[10] 44[13] 557[2]
* o [63/12,av=5.3,19.0%]: 31[7] 56[4] 5[4] 136[3] 46[4] 106[15] 12[6] 43[6] 57[2] 125[9] 143[1] 302[2]
* p [12/4,av=3.0,33.3%]: 17[1] 105[4] 30[5] 121[2]
* r [40/7,av=5.7,17.5%]: 59[5] 53[9] 96[8] 220[8] 248[2] 344[2] 112[6]
* s [48/12,av=4.0,25.0%]: 62[5] 35[6] 71[4] 78[2] 110[11] 38[9] 217[2] 505[3] 600[2] 297[1] 275[2] 285[1]
* t [69/17,av=4.1,24.6%]: 22[4] 29[5] 26[6] 554[1] 3[5] 41[6] 16[9] 34[5] 60[2] 61[3] 14[7] 50[6] 32[4] 64[2] 39[1] 643[2] 288[1]
* u [24/8,av=3.0,33.3%]: 239[3] 316[5] 95[3] 250[6] 371[3] 388[2] 409[1] 440[1]
* v [18/1,av=18.0,5.6%]: 807[18]
* w [13/6,av=2.2,46.2%]: 72[2] 290[1] 19[2] 66[2] 40[5] 1[1]
* x [4/1,av=4.0,25.0%]: 1005[4] (though note that the DOI has no word beginning with x-.)
* y [9/1,av=9.0,11.1%]: 811[9] (though note that #811 = FUNDAMENTALLY, i.e. the DOI has no word beginning with y-.)

That is, ‘a’ appears 43 times in B2 and has 15 homophones, which means that the average number of instances per individual ‘a’ homophone is 2.9, and the proportion of ‘a’ homophones to ‘a’ instances is 34.9%: specifically, index #24 appears 4 times, index 36 appears 2 times, index #28 appears 5 times, and so on.

We can also list these results in order of the well-known ETAOINSHRDLU decreasing frequency mnemonic:
* E [103/14,av=7.4,13.6%]
* T [69/17,av=4.1,24.6%]
* A [43/15,av=2.9,34.9%]
* O [63/12,av=5.3,19.0%]
* I [55/12,av=4.6,21.8%]
* N [69/8,av=8.6,11.6%]
* S [48/12,av=4.0,25.0%]
* H [37/8,av=4.6,21.6%]
* R [40/7,av=5.7,17.5%]
* D [49/11,av=4.5,22.4%]
* L [32/10,av=3.2,31.3%]
* U [24/8,av=3.0,33.3%]

Hence the actual implicit frequency ordering (i.e. in terms of decreasing number of homophones used in B2) was more like:

* 17 T
* 15 A
* 14 E
* 12 O/I/S
* 11 D
* 10 L
* 8 F/H/N/U
etc

DOI letter statistics

We can also look at the letter statistics for the DOI (numbers corrected as per B2), and at how many times each index is used in the B2 ciphertext (i.e. ‘.’ = “index not used”):

* a occurs 166 times: (4)(2)(2)(5)(2)(1)(4)(4)(2).(3)…..(2)(6)(1)………..(4)……(1)………………………………………………………………………………………………………………..
* b occurs 48 times: (1)(2)(4).(1)(1)…(1)…(1)…………………………….
* c occurs 53 times: (3)(2)(2)(5)(2)(1)..(2)……………………………………..
* d occurs 36 times: (8)(10)(11)(4)(2).(3)(3)(1).(5)(1)….(1)……………….
* e occurs 37 times: (15)(13)(13)(6)(4)(13).(15).(7)….(3)..(6).(2)(2)(4)(2)…………..
* f occurs 64 times: (1)(6)(3)(4).(4)..(1)…(1)…………..(1)………………………………
* g occurs 19 times: (6)(5).(1)…(3)………..
* h occurs 78 times: (4)(9)(8)(5).(7).(2)………..(1)……(1)……………………………………………
* i occurs 68 times: (7)(12)(4)(5).(15).(4)(1)..(1)(2)……(2)..(1)………(1)……………………………..
* j occurs 10 times: (1)..(1)……
* k occurs 4 times: (1)…
* l occurs 34 times: (5)(6)(7).(3)(1).(1)(1)…(4)(1)(3)……………….
* m occurs 28 times: (1)(1)(2).(2)…………………..
* n occurs 19 times: (13)(13)(13)…(8).(8).(10)(2)(2)……
* o occurs 144 times: (4)(6)(7)(6)(4)(4)(2)(15)(9).(3)(1)……..(2)……………………………………………………………………………………………………………
* p occurs 60 times: .(1)(5)(4)(2)……………………………………………….
* q occurs 1 times: .
* r occurs 40 times: (8)(5)(7)(6).(8)(2)..(2)…………………………
* s occurs 62 times: (6)(9)(5)(4)(2)(11)……..(2)…(2).(1)(1)…….(3).(2)…………………………
* t occurs 252 times: (5)(7)(9)(4)(6)(5)(4)(5)(1)(6)(6)(1).(2)(3)(2)……………………………………………(1)………………………………………………(1)………..(2)………………………………………………………………………………………………………
* u occurs 28 times: (4)(3)(6)(5)(3)(2)(1).(1)……………….
* v occurs 2 times: (18).
* w occurs 59 times: (1)(2).(5)(2)(2)……(1)……………………………………….

Of course, this clearly confirms the theory that the DOI contains no xylophones, no yoyos, and no zebras. 🙂

As has been pointed out many times, the way that the usage patterns are heavily biased towards low numbers implies that the homophones were mainly taken from the start of the DOI, though with scattered exceptions.

B2’s Homophone Patterns

Because the encipherer used so few of the possible homophones (i.e. because A appears 166 times in the DOI, all 43 instances of A in B2 could have used different symbols, but only 15 homophones for A were used in B2), the ciphertext B2 is solvable as a pure homophone cipher: and in fact some automated homophone solvers can solve Beale B2 unassisted (though not B1 or B3, sadly).

With that in mind, it is also interesting to look at B2’s homophone pattern, to see if this tells us more about how B2 was constructed:

* a homophone sequence: ABCDAEFFCGHHIJKLLMLCFKNLDHOCGBJAMJAGJJJNFCH
* b homophone sequence: ABCCCDCDEFG
* c homophone sequence: ABCCADEFDGABAGFDA
* d homophone sequence: ABACDEAEADECBFEAABGEEBHCIHAIBJIIAIEEKADBABBGDEHEE
* e homophone sequence: ABCDECAFGBEGHGGEACIJAFDJKLCMNEFDBDFCAABFEFNAEABCNGEJCNCILNJKANCNFANCNFEFCCGFEFCANJHFNAEFENACNGJLBCEFIEMLF
* f homophone sequence: ABCCACADCCEAFEGBHBEBC
* g homophone sequence: ABCBCDBCACBCBAB
* h homophone sequence: ABACDEFEBEDAGAGEDEDFBAEBEEGAGGAHBAGEG
* i homophone sequence: ABCDCEFGACDHCEFFEACAAIECDCDEDIECGDCCIDEEJFEKCLCEECIKECC
* j homophone sequence: AB
* k homophone sequence: A
* l homophone sequence: ABCDEFGHBFCCBCFEIACHCEBEHBAAJABC
* m homophone sequence: ABCCDD
* n homophone sequence: ABCDACABDECFGCFAFGDHFEGBBDCAAAGBGDBCGBHAGBFDFAGBGGAFFBBDFGAGBGCCADFBA
* o homophone sequence: ABCDEFABGHIFFHFEFJJCIHGBJKGGAFHFALAFAJHHFEJBJDGFFJFFEAFCJFGCJDL
* p homophone sequence: ABCBDCCDCBCB
* r homophone sequence: ABBCAADCEDFADEBGFDBBDAGCGGDDGCCBCCDBBG
* s homophone sequence: ABCDBEBEAEFGCHBAACIFJFBEFFBGEEFCEAKELFDEHFEFKEHI
* t homophone sequence: ABCDEFGHIECJBKGHFLJKBBEKCHGFMLNOAEJMBLGKACHFLPMCQKILGLHGNAEPFKGGMFKCGR
* u homophone sequence: ABCDBEFFAAEBCDDBDDECGBDCH
* v homophone sequence: AAAAAAAAAAAAAAAAAA
* w homophone sequence: ABCDEFEDCAEEE
* x homophone sequence: AAAA
* y homophone sequence: AAAAAAAAA

My Conclusions

One thing that stands out for me is that only a single homophone for V was used, (a) even though it appeared 18 times in B2, and (b) even though two were available (DOI #818 “VALUABLE” and DOI #1132 “VOICE”). To me, this seems a fairly clear indication that the search for homophones stopped earlier in the DOI. Combine this with the fact that X is #1005, and it seems likely that the highest genuine DOI index would have been (say) 1000: everything after that would be a special secondary code (e.g. for ‘X’).

5 thoughts on “Introduction to Beale Cipher B2…

  1. James R. Pannozzi on September 24, 2018 at 6:55 am said:

    Beautiful statistical overview of Beale.

    If only we had something similar for Voynich.

    By the way, am I the only one thinking that the Beale (any of the unsolved ones), is a natural for a quantum computer to work on, brute force ? Use the first as a test and see how the qubits handle that.

    Of course there’s no “treasure”, the guy was trying make some quick cash.

  2. James Pannozzi: alas, it is merely the retrospective clarity one gets from a ciphertext that has been broken – when I move onto the (unsolved) B1 and B3 Beale Ciphers, things will turn murky again. 🙁

  3. Nick,
    forgive me for jumping from one topic to another, why is your site not secure? You are not on WP.com?
    My anti-virus doesn’t like it and lets me know all the time.
    Best regards
    Ruby

  4. Ruby: my site is hosted on its own WordPress multi-site server, and I’ve been meaning to move over to https for at least the last couple of years. I’ll try to make that transition happen sooner rather than later, thanks for reminding me. 😉

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Post navigation