Even if (and I would not disagree that it’s a big ‘if’) we accept that the 14×14 rearrangement of the d’Agapeyeff Challenge Cipher’s Polybius square output is a staging point on the reconstructive road back to the original plaintext, we’re still left with an unknown transposition of an unknown substitution. Which is not great. 🙁

However, what struck me this morning was that if d’Agapeyeff used a known text as the plaintext AND that plaintext was in Project Gutenberg, we could perhaps try using Big Data techniques to find the best matching frequency distribution of any consecutive 196 characters.

In practical terms. the idea would be to do the following for all of the Project Gutenberg texts:

  • transform them into pure text versions (i.e. A-Z only)
  • frequency count each consecutive block of 196 characters
  • sort that block’s frequency count
  • compare that sorted frequency count against the sorted frequency count of the d’Agapeyeff 14×14
  • display the 100 blocks ‘closest’ to the d’Agapeyeff 14×14

At the very least, the specific kind of passages this search highlights might well yield some insight into what is going on under the hood. Might be a bit of fun for a Hadoop person to try?

PS: the 14×14 d’Agapeyeff staging point looks like this:

    JBLOPBPDKDPION
    DIILNMKCKKIILB
    DJMLNPJIEMJJJR
    CEEKCKJOJJDBLQ
    OICLJIMKEKNODO
    DOOCLGBMBKKGKD
    CJLKDMCLOKCCCX
    IKPPNCONEDOEBS
    BBOPOPIPGJDEJF
    EMBDIKLNBLDPKR
    EBDNNPMOIPKEGI
    MMOLMDBGBEBMJQ
    GCLLGGMLONJLKM
    GNBLMJKDJIOKBQ

The frequency distribution for this is:

K  B  J  L  O  D  M  I  C  P  E  N  G Q R F S X A H T U V W Y Z
20 17 17 17 17 16 15 14 12 12 11 11 9 3 2 1 1 1 0 0 0 0 0 0 0 0

Normally for a challenge cipher in an English cipher book, you’d start by guessing that ‘K’ maps to plaintext ‘e’; that ‘BJLO’ map to (some combination of) plaintext ‘taio’ (or similar); and then try to make up the rest. The problem here is that because we’re apparently dealing with a substitution AND a transposition, we don’t have that cryptological luxury.

Yet if it turns out that the best frequency distribution matches are all from Shakespeare, this might give us a very strong hint as to where to look for the plaintext. Just a thought! 🙂

One thought on “d’Agapeyeff Cipher: how about a sorted frequency distribution search of Project Gutenberg?

  1. James Pannozzi on May 2, 2021 at 1:57 pm said:

    Just as an observation en passant, I can’t help thinking that desktop quantum computers will become available to all who can afford them in the next 10 years.

    Such searches as you describe would be child’s play for them.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Post navigation