23Dec 2013

The d’Agapeyeff Cipher once again…

As is well known, Alexander d’Agapeyeff’s 1939 challenge cipher looks like this:-

75628 28591 62916 48164 91748 58464 74748 28483 81638 18174 74826 26475 83828 49175 74658 37575 75936 36565 81638 17585 75756 46282 92857 46382 75748 38165 81848 56485 64858 56382 72628 36281 81728 16463 75828 16483 63828 58163 63630 47481 91918 46385 84656 48565 62946 26285 91859 17491 72756 46575 71658 36264 74818 28462 82649 18193 65626 48484 91838 57491 81657 27483 83858 28364 62726 26562 83759 27263 82827 27283 82858 47582 81837 28462 82837 58164 75748 58162 92000

Almost all cryptanalyses of this ciphertext start from the reasonable observation that (a) this is dominated by number-pairs of the form [6/7/8/9/0][1.2.3.4.5], and that (b) these pairs have a very strongly language-like distribution:-

** .1 .2 .3 .4 .5 6. _0 17 12 16 11 7. _1 _9 _0 14 17 8. 20 17 15 11 17 9. 12 _3 _2 _1 _0 0. _0 _0 _0 _1 _0

To simplify discussion (and ignoring the issue of fractionation for the moment), we can assign these structured number pairs to letters in an obvious sort of order, e.g.:-

** .1 .2 .3 .4 .5 6. _A _B _C _D _E 7. _F _G _H _I _J 8. _K _L _M _N _O 9. _P _Q _R _S _T 0. _U _V _W _X _Y

In which case, the rather less verbose version of the same ciphertext would look like this:-

J B L O P B P D K D P I O N D I I L N M K C K K I I L B D J M L N P J I E M J J J R C E E K C K J O J J D B L Q O I C L J I M K E K N O D O D O O C L G B M B K K G K D C J L K D M C L O K C C C X I K P P N C O N E D O E B S B B O P O P I P G J D E J F E M B D I K L N B L D P K R E B D N N P M O I P K E G I M M O L M D B G B E B M J Q G C L L G G M L O N J L K M G N B L M J K D J I O K B Q - - - -

Cryptanalytically, the problem with this as a ciphertext is that, even discounting the ’00’ filler-style characters at the end, it simply has too many doubled (and indeed tripled) letters to be simple English: 11 doublets, plus 2 additional triplets. Hence the chance of any given letter in this text being followed by itself within this text is 15/195 ~= 7.7%, while the chance of any given letter being followed by itself twice more is 2/194 ~= 1.03%.

According to my spreadsheet, if the letters were jumbled randomly, the chances of the same letter appearing twice in a row would be 7.44% (very slightly less than what we see, but still broadly the same), while the chances of the same letter appearing three times in a row would 0.594% (quite a lot less).

It struck me that these statistics might possibly be what we might expect to see for texts formed of every second or every third letter of English. So, I decided to test this notion with some brief tests on Moby Dick:-

Distance – doubles – triples
1 – 3.693% – 0.075%
2 – 4.426% – 0.275%
3 – 5.994% – 0.476%
4 – 6.289% – 0.466%
5 – 6.682% – 0.566%
6 – 6.491% – 0.546%
7 – 6.372% – 0.508%
8 – 6.536% – 0.533%
9 – 6.544% – 0.536%
n – 6.524% – 0.525% (i.e. predicted percentages based purely on frequency counts)

Indeed, what we see is that the probability of a triple letter occurring in the actual text starts very low (0.075%), but rises to close to the raw probability (from pure frequency counts) at a distance of about 5 (i.e. A….B….C….D…. etc).

So, comparing the actual triple letter count in the ciphertext with the ciphertext’s raw frequencies would seem to suggest a transposition step of about 2 is active, whereas comparing the double count in the ciphertext with the ciphertext’s raw frequencies would seem to suggest a transposition step of about 5 is active.

Yes, I know that this looks a bit paradoxical: but it is what is. Still workin’ on it…

Posted in: D'Agapeyeff Cipher

64 thoughts on “The d’Agapeyeff Cipher once again…”

Menno Knul on December 23, 2013 at 6:44 am said:

Nick, I replied to your newsletter about the d’Agapeyeff cipher.
Geo on December 23, 2013 at 3:16 pm said:

Nick: Surely, the presence of triple letters can be explained by the ending of one word with a double and the beginning of another with the same letter ie. the phrase “D’Agapeyeff fools us all” contains 3 consecutive f’s when concatenated.
nickpelling on December 23, 2013 at 3:25 pm said:

Geo: yes, and I’d be the first person to include a “SEPIA AARDVARK” in my plaintext for the same sort of reason. 🙂

But thinking about it a little more, the doublet count is far more informative than the triplet count, for the simple reason that there are so few triplets (i.e. 2) that statistics aren’t really reliable. Strictly speaking, the ciphertext size may not be quite large enough for us to draw really strong inferences from the doublet count either, but we have what we have and nothing else besides to help us. 😐
bdid1dr on December 23, 2013 at 4:22 pm said:

The sample you’ve portrayed, here, reminds me of IBM punch cards which were used, for example, for university enrollment data in the 1960’s-70’s.
bdid1dr on December 23, 2013 at 5:24 pm said:

A little bit b4 your time? You’ve certainly put your university education to grrr-eat use, here on your blog! Njoy the Season with much m-r-e-mnt!
🙂
bdid1dr on December 24, 2013 at 9:52 pm said:

I’m not sure if I’m recalling accurately, but will mention a vague memory of the alphabet letters “o” and “l” were not numerically recognized on the punch cards because of similarity of programming uses of numerals one and zero.
Before my first visit on this website, I checked your home page biography — fascinating!
May you and your family have a great holiday!
beedee
Jim Melichar on December 26, 2013 at 2:49 am said:

Nick, I’m actually (once again) in the middle of some extensive trial decryptions on the D’Agapeyeff cipher.

I’d encourage anyone who hasn’t taken a look at the cipher with the 6-0 digits removed to do so. There’s very little about the ciphertext that makes sense with those digits included, but very much that does make sense once they are excluded. As I’ve noted here in the past, D’Agapeyeff himself notes this technique as a possible attempt to thwart cryptanalysis.

I’d love to prepare an entire paper from my notes and trials over the past three years if I get a chance, but I’ve put most of my free time into attempts to solve it.

Thanks for giving this cipher some press again, it needs it.

Jim
nickpelling on December 26, 2013 at 11:21 am said:

Jim Melichar: I’ve been doing d’Agapeyeff trials myself over Christmas (well, it was that or the washing up), so hopefully should have more to post about it very soon… don’t touch that dial etc. 😉
bdid1dr on December 26, 2013 at 9:54 pm said:

Nick, B4 I 4get N-tirely, I would like to n-lighten some of our latest generation of Christmas song-singers to one of the funny Christmas songs which receives almost as much attention as do various cipher mysteries. My method or mnemonics for that silly Christmas Carol:
1 Partridge
2 Turtle doves
3 French hens
4 Calling (colly) birds
Five –Gol—-dennnn Rings (This fifth line is long drawn-out, and allows the singer to then deeply inhale before continuing to:
Six geese a-laying
Seven SWANS a-SWimming
Eight Maids a-Milking
Nine Ladies dancing
Ten Lords a Leaping
Eleven Pipers piping
Twelve Drummers drumming

It all very much resembles “cheer-leading” chants — and is discussing the preparations of a big party — which kicks off with the arrival of the lover, his buddies, and their girlfriends/wives……

Some fifty years ago, one of our major television broadcasters did an animation/cartoon. Hysterically funny! Do the math for the repetitions of each verse’s number of animals (and humans) — and the walls of the castle bulging, and birds escaping through any opening available. Oh, I wish ‘somebody’ could find that cartoon!
bdid1dr on December 26, 2013 at 10:13 pm said:

Still somewhat incoherent — in that with each successive verse, one repeats the previous verses, example: Four colling birds, three french hens, two turtle doves, and a partridge……
One only gets to take in a big gulp of air on the “Golden rings….
Maybe I’m trying to describe the exponential developments of the song? 12-partridges, 24 turtle doves, 36 french hens….?
I admit that math has never been my strong suit. But I think you get the picture (cartoon?).
Jim Melichar on December 27, 2013 at 4:17 am said:

Nick,

One of the things I did, rather than columnar transposition was to try all 48*47 routes for filling and extracting from a square. However I have not attempted this with the 6-0 digits left in. You may want to give it a go.

Jim
bdid1dr on January 3, 2014 at 10:07 pm said:

Mr. Melichar (or Nick), what is the significance of not using the numerals 6 and 0? Are they “place holders” that work with the nulls that Nick discusses?
Sorry I asked — probably a really dumb question. BUT if I don’t get an answer, I forever ponder (when I’m not 1-dering). !
bdid1dr on January 4, 2014 at 3:23 pm said:

What I still 1-der about: Only 1 zero in the main body of the text and 3 zeros which finish one column short. Has somebody already posed this pondering? Do aaaardvarcks still figure in the puzzle? 🙂
bdid1dr on January 10, 2014 at 4:03 pm said:

What if somewhere in his puzzle d’agapeyeff has encoded apostrophe’s for any word which contains ‘e’? For example:
d’ agap’ ‘ff

Greek word ‘agape’ (thanks/thankful/prayer/amen)
Jim Melichar on January 15, 2014 at 11:11 pm said:

While parsing through various theories in my notes on the D’Agapeyeff cipher I went back to the original ciphertext and assumed ‘0’ from the ’04’ blip was the true end of the cipher.

When pairing up digits beginning with that 4 alternating between 1/2/3/4/5 and 6/7/8/9/0 the single letter frequencies are much more closely aligned with English than when parsed from the beginning of the cipher.

I haven’t taken it any further but it seems more interesting than most of the number pairs appearing 17 times at the top of the distribution.

Jim
nickpelling on January 16, 2014 at 12:56 pm said:

Jim: interesting point, thanks! It’s certainly true that the 04 pair doesn’t sit comfortably with the rest. For what it’s worth, I suspected at one stage that it might indicate some kind of enciphering / encoding break between two 14×7 halves. I’ll need to have another look at this with your losing-sync idea in mind…
Menno Knul on January 16, 2014 at 6:16 pm said:

Jim,

That’s why I rejected the 14×14 matrix. The zero can not have two functions at the same time, whether a filler 0 at the end of a message or as part of a digraph. A second observation is that ‘d Agapeyeff left blanks for the last block instead of 00000. My conclusion from this is that the message consists of two sentences with a different length, which I think indicates a coded dictionary cipher.
Jim Melichar on January 17, 2014 at 6:02 pm said:

Menno – I haven’t departed from the 14×14 much but I share your sentiment on the duality of the zero.

There is overwhelming evidenece, IMO, for the Polybius square as the substitution system therefore I can’t see a dictionary code being used.

Jim
Menno Knul on January 19, 2014 at 2:24 pm said:

Jim,

As far as I have seen, the Polybius square does not tackle the problem of three consecutive letters, which do not occur in English. They do in Dutch e.g. zeeengte, meeeten, etc.
Russell from France on January 19, 2014 at 5:51 pm said:

How about a sentence that containes something like “three engines”. thats 3 e’s
Jim Melichar on January 19, 2014 at 6:01 pm said:

Menno – Luckily that tripled letter business isn’t in my consideration set. But I suppose my personal bias clouds my ability to consider a dictionary code. I just think it’s very unlikely.

Jim
Menno Knul on January 20, 2014 at 10:40 am said:

Jim,

If so, what could be the reason, why the d’Agapeyeff challenge cryptogram has been removed in the editions from 1952 onwards ?
Menno Knul on January 22, 2014 at 12:07 am said:

Jim,

The d’Agapeyeff cipher contains two times three consecutive digraphs 75 75 75 and 63 63 63. English words with three consecutive vowels or consonants are extremely rare, even if distributed over two words like three engines (Russell) It simply can not be by accident, that the d’Agapeyeff cipher contains two of such extremely rare consecutive digraphs in a general book on codes and ciphers, written to inform a general but interested public about the history of cryptography. d’Agapeyeff himself was a generalist too.

The only reasonable conclusion is that the d’Agapeyeff cipher is not an alphabetical transposition cipher, but can be a coded dictionary cipher.
Menno Knul on January 22, 2014 at 12:23 am said:

Jim,

An extra argument that the d’Agapeyeff cipher is not a alphabetical transposition code is that the bigraph / character frequency would be too flat, when comparing the bigraphs with the highest frequency:

81 (20x); 75 (17x), 82(17x), 62 (17x), 85 (17x) 64 (16x), 83 (15x), 74 (14x) 91 (12x), 63 (12x), 65 (11x), 84 (11x).
Jim Melichar on January 22, 2014 at 3:40 am said:

Menno – I understand your position on the matter. That sounds like a question lost to antiquity at Oxford University Press. Not sure if anyone has actually tried to track down that answer or that there’s an answer to track down.

Side note, close circuit to Nick, you should really have a Cipher History Detectives TV show. I think you could do better than “Decoded”.

Jim
Menno Knul on January 22, 2014 at 9:04 am said:

Jim,

I take it is a compliment, but reality is worse than you think. Actually I tracked down the answer and there was an answer to track down.

Just imagine that the Information of Wikipedia (English), that the d’Agapeyeff challenge cipher was not included in later editions of Codes and Ciphers since 1939, is not correct. However, this is repeated over and over again. My own edition of the book dates 1949 and contains the cryptogram. Actually it has been removed from the 1952 edition onwards.

When I found a printing mistake in a table (p.116), where has been printed CB instead of CA (two times CB in a row), I got the idea, that the 04 might be a printing mistake as well (probably 84). If this has been a reason to remove the challenge cipher from later editions (as is told), must be doubted, because it could easily be corrected. So far the Cipher History Detecytives TV show.
nickpelling on January 22, 2014 at 2:45 pm said:

Jim: I’m certain that “Decoded” could do much better than “Decoded”. 😉
Jim Melichar on January 23, 2014 at 2:39 am said:

Menno – The distribution and its flatness is something we agree on. It’s much too flat for a 196 character message written in English unless that message was carefully crafted to be unusual. You can prove this out by taking successive random passages from a variety of source texts, include his own Codes and Ciphers book and calculate the probability of the top 6 letters in the distribution appearing at least 16x. Well as much as you can prove anything out with statistics – I guess.

As for your note about Wikipedia, I’d be glad to update the entry if you have the specific editions the cipher appears in.

Obviously there was a point at which D’Agapeyeff stated he no longer knew how to solve it, and if I were merely guessing I’d assume that’s why it was eventually removed. But I don’t like to deal in guessing, assumptions or theories. I mostly enjoy testable hypotheses.

For me, the substitution plus columnar transposition of the original ciphertext digraphs has been disproved by Tiago Rodriguez. He elegantly (and sucessfully) showed how a 196 character ciphertext can be solved using maximum likelihood estimates against the bigraph distribution. If that solution existed via hilclimbing, he would have found it.

While there may be some other transposition in play on the original ciphertext, it most certainly cannot involve fractionation of the digraphs. For one, that would yield some impossible 28 column solution space, and secondly I cannot imagine a scenario by chance where the 9s from the ciphertext paired only with 1s until the last column or row of a 14×14 matrix.

It’s my opinion that the ciphertext is staged via steganography; that it’s actually a 98 character cipher and uses only the 196 [1-5] digits. Only my opinion and a hypothesis I’ve long been testing through many different permutations.

Jim
Menno Knul on January 23, 2014 at 8:52 am said:

Jim

I noticed that Nick possesses the same 1949 edition with the challenge cryptogram in it (revised and reset).

I missed your comment on the 04 matter. I think we agree on it, that the 0 cannot have a double function as a filler 0 and as a part of a digraph. The 04 appeared in both the 1939 and the revised and reset 1949 edition. What does this mean ? If it is a mistake for e.g. 84, has it escaped the attention of the revisor or has it been kept, because it is a partition between two sentences or two 14×14 matrixes ?

I puzzled some time on the question why the digraphs 61, 73 and 95 are lacking in the challenge cipher, adjacent to your remark on the 9s.

If the challenge cipher would be steganography, as is your opnion, isn’t that too dificult for a random reader of a random book on the history of codes and ciphers instead of a highly specialized book for cipher specialists ?

Last question: does anyone know about attempts to solve the challenge cryptogram in the years 1939-1949 ? Could it be that someone we don’t know of has solved the challenge cryptogram before, and would that be a good reason to remove the cipher in 1952 and later editions ?
nickpelling on January 23, 2014 at 9:41 am said:

Menno, Jim: au contraire, mes frères, I believe that “04” would make perfect sense as a digraph from the bottom row of a 5×5 grid. It’s just a little curious and coincidental that it just happens to appear right at the end of the 7th line…

For what it’s worth, my current suspicion is that whereas the “00” pairs are padding after the end of the 14×14, the “04” is actually an “X” character (i.e. the bottom line of the 5×5 key grid ends with “…XZ”, and with “Y” somewhere in the keyphrase at the top) appended to the end of the original 195-letter plaintext message to pad that up to the 14×14 (i.e. 196) size. That is, that the “04” marks the end of the actual plaintext message, but with the order transposed so that it ends up at the end of the 7th line.

I’d agree that it could well be that this challenge cipher has been solved before (and possibly multiple times): but I try to keep a watchful eye on things, and nothing obvious has come up.
Jim Melichar on January 23, 2014 at 12:31 pm said:

Nick – I concede ’04’ could be a null character. He used a similar example kof a null in his combination substitution & transposition example (aka ADFGX) where he stuck a ‘D’ at the end from his ‘ABCDE’

Menno – How can we judge too difficult for an unsolved challenge cipher? He included examples that would be difficult to solve (eg the ADFGX cipher)

He suggested the use of nulls at a regular intervals in his book, so it’s plausible.

I understand fractionation is difficult, but he used it in his book.

Jim
Menno Knul on January 23, 2014 at 1:30 pm said:

Nick,

That leaves the question, what you do with the zero’s of 92000, which is usually regarded as the end of the plaintext message (and not 04). It would be worthwile to have a look at the cipher on p. 116 of d’Agapeyeff’s book Code and Ciphers, where he writes: ‘Nulls; WA, WE, en W, to end message in groups of five letters.’ It is the same table, where the mistake of the double CB has been made in the diagonal line (bottom left up), which I mentioned before.

By the way, this table could well function to solve the d’Agapeyeff cipher as he writes: ‘These figures, if greater secrecy is required, could again be enciphered and thus converted into letters by means of an agreed cipher. For that purpose it would be better to arrange for the second operation. Divide the figures into pairs (!) and then convert them into letters by means of the table given on p.116.’ Mind you, he said so in a chapter on Dictionary Code Systems, which method is my favourite for the challenge cryptogram. To encrypt the other way round may have been chosen, from letter combinations to numbers
nickpelling on January 23, 2014 at 1:56 pm said:

Menno: what I said was that I believe the zeroes of 92000 are padding added when converting the (probably 14×14) transposed ciphertext to blocks of 5 digits (i.e. 14x14x2 = 392, so append three zeroes to pad it up to a multiple of 5 digits). What I meant in my comment was that I also suspect the 04 is an enciphered ‘X’, appended to the end of the 195-character plaintext to pad it up to 14×14 = 196 characters long.

In short, I suspect that the ’04’ pair will be the last character in the transposed version of the plaintext. Hence – my wobbly reasoning goes – the transposition probably proceeds in columns going from left to right, with the characters within each column reordered according to a consistent pattern such that characters from row #7 supply the final character of each column.

However, the thing that’s holding me up is that my hill-climbing transposition solving code doesn’t seem able to solve for substitution ciphers at the same time (it fails to work for test data), so I don’t yet have the ability to form a good metric to drive the hill-climbing search.
Jim Melichar on January 23, 2014 at 5:59 pm said:

Nick – each time you change the columnar transposion key just use a MLE scoring function against the known distribution of the 676 digraphs for English.

When you do this you don’t have to try to solve the monoalphabetic substitution step until you’ve got either the correct transposition key or the reverse of it.

Maybe you are already doing this, but this is the work I mentioned by another gentlemen on the old dagapeyeff.com site. It is archived on Google now.

I was able to verify this method works on plaintexts 196 characters long across a 14 column square.

Jim
nickpelling on January 23, 2014 at 6:03 pm said:

Jim: I’ll try something along those lines next, thanks. The digraph-based metric I was using proved unable to hill-climb a solution to my test transposition data, let alone to the real thing. 😐
Jim Melichar on January 25, 2014 at 12:27 am said:

Menno – I didn’t get a chance to respond to you the other day.

Bottom line is we’re all experimenting on the cryptogram in a non-standard way. Nick is playing with a polybius square format never used before and not in the book. You’re hypothesizing AD used a dictionary cipher where he only picked words that followed a predetermined pattern. I’m eliminating the pattern and using a standard Polybius square.

It’s good we’re all approaching it differently because it helps drive creativity, eliminate possible solution paths and eventually get to an answer.

Jim
nickpelling on January 25, 2014 at 1:45 am said:

Jim: to be precise, I’m trying to solve a 14×14 transposition cipher and a substitution cipher simultaneously.

As suggested, I’ve today built up a model for English digraphs (based on using digraph frequencies to drive a set of binomial coefficients ranging across 195 trials) and adapted my hill-climbing programme to use it. However, this fails completely on my test text (a block of text taken from elsewhere in d’Agapeyeff’s book), because the digraph frequencies that exhibits are roughly double the ones predicted by the model.

i.e. TH (1.52% in English texts) occurs 6 times out of 195 possible pairs, but my binomial model (=BINOMDIST(6,195,0.0152,FALSE) in Excel) predicts that this should only happen 4.82% of the time. Similarly, HE (1.28%) occurs 5 times out of 195 possible pairs, but my binomial model (=BINOMDIST(5,195,0.0128,FALSE) in Excel) predicts that this should only happen 6.63% of the time. I don’t know whether I’ve chosen the text badly (or unluckily), or whether I’ve built the model badly… hence lots of head-scratching going on here. 🙁

PS: my 14×14 test text is
TOWARDSTHEENDO
FTHEFOURTEENTH
CENTURYEUROPEW
ASREORGANIZING
ITSELFTHENATIO
NSSTARTEDTOASS
UMESOMEOFTHEFO
RMSWEKNOWTODAY
THEREWEREWARSA
NDEXCURSIONSAN
DAGREATDEALOFD
IPLOMATICACTIV
ITYASCIVILIZAT
IONANDLEARNING
Jim Melichar on January 25, 2014 at 2:00 am said:

Nick – I do believe ciphertext length has a direct impact on the ability to find the correct transposition.

Two follow ups….if you double the amount of text but hold the column width at 14 do you get a solution?

Secondly, if you make a rank ordered list of the best results does the correct transposition key exist in that smaller set? This is how I tackle fractionated ciphers and it may be of some help for this problem as well. You could also take a genetic approach to it by generating some seeds and then taking only the best scoring seeds to build a list of all the best transposition keys. From there you could send the best results to the monoalphabetic substitution routine.

Does that make some sense?

Jim
nickpelling on January 25, 2014 at 9:46 am said:

Jim: with the test text stats so far away from the percentages driving the model, the hill-climbing search immediately moves away from the solution… that’s the core problem. And we have to have a technique that stands a chance of converging for a 14×14 square, or it’s all for nothing.

I’m going to do some more tests, I suspect that the (widely repeated) digraph frequency stats I’m using could well be wrong or unreliable.
Jim Melichar on January 25, 2014 at 12:29 pm said:

Nick – juat so I understand, in your algorithm, you are calculating the distribution of all the digraphs in your test cipher, then sorting those from most frequent to least frequent and comparing against the known distribution of all English digraphs, yes?

Jim
nickpelling on January 25, 2014 at 12:59 pm said:

Jim: rather than comparing, the final step is to multiply the binomial values together, to provide a degree of match between the model and the pairs as observed. However, because the frequency distribution of the digraphs as observed in the test text don’t match the values drawn from the digraph frequency distribution given in tables on the Internet, it ain’t converging towards the correct solution. 🙁
Menno Knul on January 25, 2014 at 1:41 pm said:

In his book Codes and Ciphers d’Agapeyeff says: ‘Speed of encoding and decoding is essential’. (p.115) Probably he has written the challenge cipher in ten minutes, but it has taken by now seventy years to solve the problem. He continues: ‘One of the ways in which ordinary dictionaries can be used is first to agree on a certain edition, say, for instance, the Concise Oxford Dictionary, current edition [3rd edition, 1934-1951], by Fowler and Le Mesurier.’

As you remember d’Agapeyeff deleted the challenge cipher in his editions from 1952 onwards. I think the introduction of the 4th edition (August 1951) has everything to do with the removal of the challenge cipher in later editions as from that moment the relation with his tool has been broken.

It has been said, that he removed the challenge cipher because of mistakes, but he could have corrected them. As you know I regard 04 as a mistake for 84 or (preferably) 94.
In his 2nd revised edition he did not correct it, but it could have escaped from his attention. Besides a single mistake would not bother too much as he himself warns his readers not to overlook mistakes in ciphers.

Now the problem arises, that (in my opnion) the cryptogram cannot be solved without the 1934 edition of the Concise Oxford Dictionary, though d’Agapeyeff gives a way out by using the Mansfield method, which he calls very simple and ingenious. (p.140) Unfortunately the Mansfield’s Progressive Dictionary is outdated as well, so that one should make one’s own progressive dictionary based on words with an initial aa, ab, ac, ad etc. He says: ‘This application of the law of probability to dictionary codes is very interesting.’

Finally, on the examples page (144) d’Agapeyeff gives the solution of the mono-alphabetic substitution (p.128), the poly-alphabetic substitution or Vigenere (p. 135) and the challenge cipher (p. 140). Even the order of the three ciphers seems to fit with the pages.

I have ordered the 1934 edition of the Concise Oxford Dictionary from Germany. In the meantime I am working on the Mansfield method.
Menno Knul on January 26, 2014 at 10:48 am said:

I should add yet something to my former reply.

In his book says d’Agapeyeff, that it might be necessary to encrypt the dictionary code once more, if greater secrecy is required. (p. 115) When I read the earlier pages, which Nick wrote on the d’Agapeyeff cryptogram, I noticed his concern with the two trigraphs and many (11) bigraphs, which – as he says – usually would not appear in a normal English text. This concern of Nick made me wonder, if the d’Agapeyeff cipher wouldn’t be a numerical code instead of an alphabetical code, as numbers easily get doubled (22, 44, 88) or tripled (222. 444, 888). A second thought was, that the distribution of the 9’s – as Jim said – is rather peculiar: 91 is rather frequent (11x), 92, 93, 94 low frequent and 95 absent. In the case of a dictionary this could well point to end of the dictionary, where the low frequent letters XYZ are collected.
Menno Knul on January 28, 2014 at 6:23 pm said:

It might be useful to add a further note to my earlier replies.

If the task would only be to look up the page and lemma in the Concise Oxford Dictionary the d’Agapeyeff cipher it would not be much of a challenge, upon which his readers could exercize their skills. So I wonder, if he didn’t want his readers to use the Mansfield method to find the right page and lemma. For that purpose he must have changed the pagination with an unknown code. d’Agapeyeff calls the Mansfield method ‘simple and ingenious’. (p. 140) This would certainly comply with the notion of a challenge cipher. As the Mansfield Progressive Dictionary is not available either, one should make one’s own progressive dictionary by listing and numbering the digraphs AA, AB, AC, AD, etc.

By the way, the use of the Mansfield method may be a further reason to remove the challenge cipher from d’Agapeyeff”s book in 1952, because the outcome of the Mansfield method to resolve the challenge cipher is similarly related to the 3rd edition of the Concise Oxford Dictionary (1934-1951) as well.
Tony on April 25, 2014 at 12:50 am said:

TOWARDS THE END OF THE FOURTEENTH
CENTURY EUROPE WAS REORGANIZING
ITSELF THE NATIONS STARTED TO ASSUME SOME OF THE FORMS WE KNOW TODAY THERE WERE WARS AND EXCURSIONS AND A GREAT DEAL OF DIPLOMATIC ACTIVITY AS CIVILIZATION AND LEARNING.

Just wanted to make sure you guys were not to smart for your own good not to see that it was legible anyway.
Robert thomas on May 2, 2014 at 5:50 am said:

If this is actually an encrypted math list if you take each number and divide by that number with a decimal you get for each number 100,000…. 63630 divide by .63630= 100,000 same if you do this for all numbers
Tiago Rodrigues on May 19, 2014 at 4:55 pm said:

Hi Nick,

As Jim has mentioned, I have played quite a bit with this cipher back in 2010 when I built a Permutation+substitution solver . I had not success with the d’Agapeyeff cipher back then and since in the meantime I’ve lost some of my notes, I recently rewrote the code and did some experiments. I’ll leave a few comments on the results so far, which you may find useful.

It is possible to solve a product cipher (substitution+transposition) by solving the transposition first using a goodness of fit analysis on the log digraphs (so as to abstract from the underlying substitution), and then apply the result to a substitution solver (in my case, a hill-climber scoring on log tetra graphs).
This being said, I can currently solve ciphers of about 240 characters upwards, with a 14 column permutation+substitution, depending on how it deviates from the normal english distribution. I have been unable to have success with 196 characters for sample texts because it breaks down the goodness of fit curve too much. Along this line, my hope would be to find an incremental scoring system that could complement the log digraphs for the transposition phase (such as score 2*digraphs + trigraphs). I have yet to try this.

Also, even though I like the 14×14 approach, it is interesting that if you use a 28×7 matrix instead, you are left with the 04 exactly at the center of the matrix. Flipping the matrix as you would to extract the letters by column you’d be left with a 7 column permutation to solve, which seems more feasible for a “challenge to the reader”.
As a final note, even though I am a firm believer of substitution+transposition, I must say that it bothers me the fact that removing the characters that only appear on the last column (or row, depending if you transposed it) and the 04, you are left with 13 letters only.
Menno Knul on May 20, 2014 at 2:05 am said:

Tiago,

It has been discussed before that 04 must be a mistake, because zero cannot have two functions at the same time. ‘Nulls are used to end a message in groups of five letters’ (d’Agapeyeff, Codes and Ciphers, p.116).
Tiago Rodrigues on May 20, 2014 at 1:49 pm said:

Menno,

I don’t believe it is a mistake. You need the 0-(1,2,3,4,5) to have a complete polybius square. The fact that you cannot have one element behave differently is a fallacy, d’Agapeyeff himself uses extensively letters used in the ciphers as nulls to fill the 5 letter blocks.
If I’d have to guess, 04 is used either as a terminator for the plaintext portion, possibly the ‘X’ letter (leaving 92, 93, 94 as nulls to fill the remainder of a 14×14 matrix), or an “anchor” to indicate a 28×7 matrix, as in that way it sits exactly at the center.
Jim Melichar on June 3, 2014 at 3:18 am said:

As an update, because hey, why not, I took the liberty of feeding a couple triple dozen books into my handy dandy 196 character passage extractor this weekend.

This only after I worked with Tiago for the better part of an entire week to do some diagnosis on the ciphertext.

Anyway, my goal was to find a 196 character passage where the following criteria were met:

IC >= 0.069
>= 7 unused alphabet characters
>= 22 instance of the 11th and 12th most used characters in the passage.

In my attempts to be lenient in the source texts I took everything ranging from Huck Finn to the King James Bible to the Complete Works of Shakespeare to Gadsby to some foreign language texts (primarily French).

After many billions of trials on each source text, I could find no passages that met this criteria. In fact, I had to drop the requirement for the 11th and 12th letter sums down to ~16 and move the unused letter restriction down to ~6.

At the conclusion of this analysis, I’m left only to believe that either plaintext is the most uniquely written passage of 196 characters ever written, or there is something else in play. I lean toward the latter (and yes, I’m biased).

Would love to get an unbiased interpretation.

Jim
N00B7421 on May 3, 2015 at 11:14 pm said:

I’m a N00B but I found something strange about the numbers.

Every number (if you count the digits in between them) are spaced in odd increments, save for the first and the last on some.

For example if you break up the numbers just looking at the 6’s:

75 2
628 28591 7
6291 3
6 481 3
64 91748 584 9
64 74748 28483 81 13
638 18174 7482 11
6 2 1
6475 83828 49175 74 15
658 37575 7593 11
6 3 1

Now I know almost nothing about cryptography so I bet there is a silly explanation, but this is true for all numbers 0-9. 0 has only two groups the digits before the first zero: 384 and the digits in between the next occurring zero: 387.

Also I posted a link to your website on mine.
Mine is http://secretsofharmony.org/secrets/the-dagapeyeff-cipher/
nickpelling on May 4, 2015 at 12:06 pm said:

N00B7421: the d’Agapeyeff challenge cipher almost certainly consists of several ‘layers’ of simple ciphers, all chained together. One of the layers is a paired cipher, whereby adjacent pairs of numerals refer to a 5×5 grid position (typically numbered 12345 along the X-axis, and 67890 along the Y-axis). Each paired grid position then enciphers one of 25 letters (e.g. “I” and “J” are often enciphered as the same letter).

Unfortunately, this is only one layer of several: subsequent layers appear to use transposition (reordering) ciphers, which is where things start to get a bit difficult and/or interesting (depending on how much you like code-breaking). 🙂
N00B7421 on May 7, 2015 at 12:18 am said:

nickpelling: Thank you for the explanation. I think I rather like code-breaking and I am now motivated to learn more! Though I doubt I can solve the d’Agapeyeff cipher but it may be the catalyst for me learning about cryptography.

Cheers.
War leaf on July 24, 2016 at 7:33 am said:

Hey guys I got some words out of it but I highly doubt that it was right try translation to Russian alphabet and stack them but incase I get famous I won’t tell you how I got a scentence its a long prosses with out a team
gavrosh on November 13, 2016 at 1:54 pm said:

So it would seem that this code is solved? 14th century blah blah…?
I found a funny pattern as I parsed the code:
1100111101
1101121200
1000120001
1002100001
1100010011
1110100010
1100001111
021011111
Bob Roberts on November 26, 2016 at 9:36 pm said:

What if it a map of locations or routing path? (zip codes)
Silvan Tomkins on January 12, 2018 at 5:50 pm said:

Has anyone come up with a way to organize the cipher (eg. each row of 10 numbers or each row of 15 numbers) that may yield some information? Thanks
Tobias on January 27, 2018 at 6:47 pm said:

Is it a coincidence that if you make a line plotting each column’s sum of the matrix you will get the letters WW formed by your line-> Which indeed started in 1939? All the individual rows in the matrix also gives this result.
Catherine Darensbourg on March 15, 2018 at 10:15 pm said:

What if the repeating numbers are just that — numbers in the Roman system — like MMM and CCC or XXX and III? This could be done to efficiently give military time, so we would not be looking for traditional words, whatever the final alphabet and 18 grid spots may hold …

(Ex: DCMMMH = 2400 H)
Juno on May 30, 2018 at 1:11 pm said:

Is not a cipher.

Is a math problem.
Catherine Mary Darensbourg on August 15, 2018 at 6:26 pm said:

Dear Nick, I think I noticed something important. The D’agapeyeff challenge phrase — “Here is a cryptogram upon which the reader is invited to test his skill.” — is exactly 58 letters long (59 if the period is counted as well). When the cipher is divided into 2-digit numbers instead of 5-digits, and the 2-digits looked at in reverse, no number comes out greater than 58. Could this be the key phrase we have been looking for? If so, maybe also the cipher starts at 0 instead of that being a useless null?

—Cat
Everette on August 18, 2018 at 4:45 am said:

dear nick its a simple code, u all over thinking it witch is point to it, bye the way the cid is 3940 witch is the keyword to figuring it out and the aswer to the zeros in it and what it says is disappointing like in the Christmas story, basically to get 39 40 u count the groups of numbers to zero first is 39 then 40 after witch is the cid witch is his keyword to this cypher and without it cant fin his code so basically 2 codes in one witch makes one code cause takes both to fin the code id put what it says but u all think whatever until i gave explanation and u decyphered yourself after i told u how.
jake on January 30, 2019 at 2:21 am said:

Food for though… Top of page 113: “One should note that AAA (triple letters) signifies the end of the message.” Could d’Agapeyeff’s Cipher actually be three Ciphers??? each separated by triple letters…
Catherine Mary Darensbourg on November 1, 2019 at 11:29 am said:

Okay, I have been busy and dropped this on for a while, but I think the D,Agapeyeff is two messages in polybius grid overlaid on each other. One message ranges from 1-5, the other from 6-0. Simply take the numbers in order, nothing fancy — pairing the sets from left-to-right with their digit range. Of course, I could be wrong, but I tried it on the 1-5 set (which starts: 52-25-12-14-14-14-54-42-43…) and with brute forcing have so far “MAIL LLT YR I KEY GO DUTCH C…”. Obviously there is a lot to finish — the the whole second half of 6-0 to be done (if this is correct.). The partial key I have for 1-5 is:

E * * U X
I G D R M
K * * * C
L * * Y T
* A W O H

Hope this isn’t a completely silly partial answer …