20Dec 2012

“Small explosion in Unicode factory, nobody hurt”… :-)

Well, that was what I thought when I first saw the ciphertext in Part 2 Chapter 8 of “The Fates Unwind Infinity“, an online book detailing the anonymous author’s thoughts on a whole load of speculative-philosophy-style stuff (a great big hat-tip to Richard Greendale who passed a link to it my way, sorry it’s taken so long to post about it!) It’s the kind of obsessively intense book that seems to have been written in a three-day sleep-free trance: as far as I can tell, you might well need to enter a broadly similar state in order to make proper sense of it.

All the same, there’s something quaintly naive about the cipher alphabet used, in that it is made up of a load of rare characters all thrown together, like one of those abysmal Chinese emails where the proper character encoding has failed to arrive with the main text:-

˛¿?,.ç*ϕ , ¥ϕ’¡ˇ∆-.#-ϕπ¡q.µ•∆ˇ•.,*ϕ ,˚ç¿¡˛¡ç˚•ç¡çq˘.Ω’*ϕ¡Ô.çº-.¿˚¡∆ ,. ¥µç,.˛
¡*º•¡º•ˇ.˚’ϕ.ˇ¡π.ç∞¿µ¿.ç¡∆-.y¡,ˇç*•¡•˚.¿¡’•˛•¥ç¡çµ•¿,¿.ç.•Ωç¿ϕ¡’.ˇ˚.ºÔq.¿ π˘
¡-*˚∆.•*.•#-˛¿ˇ’ç.¥¡ˇ¿¡ π¿µ¡¿.ç∆’,Ω.,˚¡•˘.ˇ•∆.¿,˘¡˛¡ç.µ•¥.•*∞Ô¡º-∆¿˘ç˚ϕ-**.
¿¥¡ç.•.ç¡,.¿πˇ•.q’ˇ¡˘’,º-ˇ∆.•ºΩ.#¡•µ*˘.˚¥¿ç.-∆’¡˛ π˘.ç•¿*•,.ç.Ôqç,¿¡ç-
∆º.•˘’¿’•.ˇµ˛ç,ˇ˚ˇ∆¿¡*y,∞¿µ¿ç.˛Ωπ.-.,•¡’º•.ç∆ ¥ yç˚.ç¿,¿ç*Ô¡’•.πϕ-˘*•ç.•˛¡-.
¿,.µç•Ω¡¿.º-µˇ•.^ˇ˚∞¿ π*ç’∆•µ¡Ô˘ˇ.µ ¥*µ˚∆.¿¡Ω•.-ˇ¿•., πç˘º.-º*∆¡*˚•qÔ
¡’.˚’#¿˛*ˇ.˘ πçˇ•Ω¡¿¥ π.µˇ¿˘.∆º˚µç˚•.*’∆µ¡-˘¡¿-ç.,¿.¿*∆.,•¡¿ç π*.-.
¿˘,Ω’Ω.ç#ˇµ•’˚µÔ¡¿˛*¥¡’.¿∆ç π*.• ¥Ωq,˘∞.¿¿-*µ-*¥¡•µçµ¡•.,˛.¿µ∆˚,.º-
çµ πˇÔµ¡’µ.’•ϕ’ ¥-#y˚∆µº-¡ˇ¿µ•ç.µ¡˛.•˘π¡Ô,µ¡˚˚µ’µ,¿,∆-˛.Ωç•.¿¥µ.µçµ,-˘¿π
¡q•.Ô∆.˚ˇ-,µˇ’µç•’¿µ-˘¡ˇ¡µ˚.π∆.•∞-µç*Ô˛˘¿.Ω¡¥˚ˇ,•y.,∆•¡˛-¡*,¿π ,,µ.˛ç∆-
µ˚*˘•.’•.ˇ’-µ¿.ˇΩçyÔçˇ¡•˘¡˚*’,.∆º¿ç•ç π , ¥˘•¿.∞-¡˚˘*µ.Ω•.¿-˛qç,¡,•∆˛¿˘-
¿.•ˇ¥µçˇ,,’*ç’*˚ˇ¡∆ π ,*Ô¿’˛¿•.µy¡,ç•ºΩç-˚ ¥ ,ˇ•¿∆¿µ˛¡˘-¿•,¡*. π∆qç.º•,º*¿#
¡˛.,µ˚.¿çq˛çˇ,ç’¡µ•.¿’.,.qç˛¥∆çˇ˘*•,-∞•¿,˘¡qÔ¥.∆µ˛•.˚’¿,-∆^,•Ω
¿*çµπçº•q˘˚-˛¡ˇ-,,.µ#.’*ˇyº˛ϕ ,Ôˇç,˘¡∆q˘¥.¿’ç˛¿,•º*º•Ω¿µ˚,,çˇ˛.q¡¿∆π
¿∆µ.,ç¥¿-¡¿.˘.ç˛∆¡¿•¡¿çy-.¿º*.¿*ç∞¿∆’¡,µˇ•.’•,.ç¿-¿ˇº*ç*¡˚.’¥π.çºΩ¡*˘˛
¿∆.#.,•¡¿çµµ¿Ô∆•ç*∆-µº*qµ˚.-.¿Ω ,çµ˛¿çˇ¥’¡µˇ•¡ˇ•., π.’ç¥*∆¿µyç.˚’µ¡¿,-
çº*ºµ.•˛.˚Ô¿µq¡ˇ.¥•˘.*,.¿-˚,µπ¡*.¿•µ¿µç˛µ-˚ç¥∞µç.•’¡ˇ,.-.’∆.Ωq’Ô¿•˚¡¿π.#,.
¥.¿˛¡,ç-˘*yµ*.¿•∆.ˇ,¿˚çº-˘¡˛,ç¿¥-*ç¡µ.,Ω•.˚ πçq¿•ˇç¿Ô.ˇ¡’Ωˇµ.*∆.ç’˛,¡¿,.
¥-.*˚’•.* π¡ºΩº.yç-.˛µ.¿¡ˇ¥µ,µç•,.˘.,.˛¡¿∆º.qç*ç¿˚-.•*.¿,˘ºç,ç¡∞.Ω’.ˇ˘ˇ
¡¿Ô∆.#ç˚•µçΩµç˛¿µµ ¥ˇ*ç¿ π*.*∆.•¿µ,ç-‘ç˚-*•¡-Ωπ ,¿¥¡ç∆˚¿.Ôq*¿¿∞.•ˇ.˛
¿çˇ¡q.¿,µ•’¿-,.˚¡’-¡*º•˛ç.ˇ,,¿.µç∆Ωπ.y¥µç˚*¡˛¿,¥,’¥¡∆çµ-ç•*.^Ô˚¿q•Ω.˛,˘µç
¥ˇµ-ˇ∞*.’ π∆¡’*˚q˘•ˇ.q∆¥µ¡˘.*¥Ô¿µ˛∆ç,.?µ˛¿.¡

Note that some of the shapes are in bold, though I haven’t transcribed these any differently (which may well be a mistake). Yet in fact, for all its typographical showiness, it turns out that this ciphertext uses only 27 different characters. Leaving ‘q’ and ‘y’ intact and replacing the rest with A-H and J-Z (I don’t like using ‘I’, it’s too easy to confuse it with ‘1’ and ‘l’), you get the rather more usable:

ABCDEFGHDJHKLMNOEPOHQLqERSNMSEDGHDTFBLALFTSFLFqUEZKGHLVEFWOEBTLNDEJRFDEA
LGWSLWSMETKHEMLQEFXBRBEFLNOEyLDMFGSLSTEBLKSASJFLFRSBDBEFESZFBHLKEMTEWVqEBQU
LOGTNESGESPOABMKFEJLMBLQBRLBEFNKDZEDTLSUEMSNEBDULALFERSJESGXVLWONBUFTHOGGE
BJLFESEFLDEBQMSEqKMLUKDWOMNESWZEPLSRGUETJBFEONKLAQUEFSBGSDEFEVqFDBLFO
NWESUKBKSEMRAFDMTMNBLGyDXBRBFEAZQEOEDSLKWSEFNJyFTEFBDBFGVLKSEQHOUGSFESALOE
BDERFSZLBEWORMSEYMTXBQGFKNSRLVUMERJGRTNEBLZSEOMBSEDQFUWEOWGNLGTSqV
LKETKPBAGMEUQFMSZLBJQERMBUENWTRFTSEGKNRLOULBOFEDBEBGNEDSLBFQGEOE
BUDZKZEFPMRSKTRVLBAGJLKEBNFQGESJZqDUXEBBOGROGJLSRFRLSEDAEBRNTDEWO
FRQMVRLKREKSHKJOPyTNRWOLMBRSFERLAESUQLVDRLTTRKRDBDNOAEZFSEBJRERFRDOUBQ
LqSEVNETMODRMKRFSKBROULMLRTEQNESXORFGVAUBEZLJTMDSyEDNSLAOLGDBQDDREAFNO
RTGUSEKSEMKORBEMZFyVFMLSULTGKDENWBFSFQDJUSBEXOLTUGREZSEBOAqFDLDSNABUO
BESMJRFMDDKGFKGTMLNQDGVBKABSERyLDFSWZFOTJDMSBNBRALUOBSDLGEQNqFEWSDWGBP
LAEDRTEBFqAFMDFKLRSEBKEDEqFAJNFMUGSDOXSBDULqVJENRASETKBDONYDSZ
BGFRQFWSqUTOALMODDERPEKGMyWAHDVMFDULNqUJEBKFABDSWGWSZBRTDDFMAEqLBNQ
BNREDFJBOLBEUEFANLBSLBFyOEBWGEBGFXBNKLDRMSEKSDEFBOBMWGFGLTEKJQEFWZLGUA
BNEPEDSLBFRRBVNSFGNORWGqRTEOEBZDFRABFMJKLRMSLMSEDQEKFJGNBRyFETKRLBDO
FWGWRESAETVBRqLMEJSUEGDEBOTDRQLGEBSRBRFAROTFJXRFESKLMDEOEKNEZqKVBSTLBQEPDE
JEBALDFOUGyRGEBSNEMDBTFWOULADFBJOGFLREDZSETQFqBSMFBVEMLKZMREGNEFKADLBDE
JOEGTKSEGQLWZWEyFOEAREBLMJRDRFSDEUEDEALBNWEqFGFBTOESGEBDUWFDFLXEZKEMUM
LBVNEPFTSRFZRFABRRJMGFBQGEGNESBRDFOKFTOGSLOZQDBJLFNTBEVqGBBXESMEA
BFMLqEBDRSKBODETLKOLGWSAFEMDDBERFNZQEyJRFTGLABDJDKJLNFROFSGEYVTBqSZEADURF
JMROMXGEKQNLKGTqUSMEqNJRLUEGJVBRANFDECRABEL

However, despite the promising-looking frequency counts, I haven’t yet had any luck cracking this as a monoalphabetic substitution cipher. The Friedman index of coincidence for this is 1.52 (slightly too low for English), which makes me strongly suspect that some punctuation is being enciphered here. Yet the commonest letter (‘E’) seems too frequent to be a plaintext ‘e’, and it doesn’t seem to encipher a space either.

As far as n-grams go, repeated 2-grams include:-
* EB (28 instances)
* SE (23)
* DE (19)
* ES (17)
* RF LB EF (16)
* ED (15)
* FE BR (14)
* BE NE BD (13)
* etc

Repeated 3-grams include:-
* GEB OEB MSE (5 instances)
* ETK EOE EBD SED KSE NES FES (4)
* LQE LBF DEJ OUL RMS RTE RFS QGE BEF BDU LBE GNE DBE EDS EBL DUL DEF DSL EKS EQN SDE GWS FTS ESG ERF JRF LKE SME SLB SEB ZSE SGE (3)
* etc

Sadly CrypTool-online only goes up to length-3 n-grams, as I suspect there may be some interesting 4-grams and 5-grams in there. An exercise for the reader! 😉

I don’t know what to make of this: when I first transcribed it, I really thought the answer would just pop out, but I haven’t had any luck with it so far… so I think I’m missing a trick. Any thoughts, codebreakers? 🙂

Posted in: Historical Ciphers

41 thoughts on ““Small explosion in Unicode factory, nobody hurt”… :-)”

Dave on December 20, 2012 at 4:00 pm said:

Here are some additional details from CryptoScope:

IoC: 0.0548 English: 0.0667

Repeated n-grams, n > 3:

SLBF 3
EDSL 3
KSEM 2
DEAL 2
LGWS 2
XBRB 2
ESGE 2
EBDU 2
BDUL 2
BJLF 2
SDEF 2
GEBS 2
LALF 2
SZLB 2
RMSE 2
SEDQ 2
DSLB 2
MSED 2
FQGE 2
EOEB 2
SEKS 2
ABFM 2
EGNE 2

EDSLB 2
DSLBF 2

EDSLBF 2

Fragments that repeat (“?” denotes wildcard character): NOR?G, ULA?F, FMD?K, ERF?Z, DSL?F, SNE?D, OEB?D, EDS?B, FES?L, FES?F, EBL?S, EFB?B, SDE?E, SED?E, BQG??N, OUL??F, NEP??S, EVq??B, RMS??S, KSE??O, UEG??B, DBE??N, ERS??S, EDS??F, GFB??E, EBD??F, EOE??D, LBF??E, FER??E, EFE??F, RF?D?U, LK?M?E, QL?E?S, SE?K?R, SE?K?D, EB?N?D, EO?B?D, BN?E?F, DD?E?F, EB?R?R, ED?L?F, EB?F?B, DJ?K??N, JR?R??D, NR?O??B, RB?F??O, UL?G??E, DS?B??R, DD?E??N, JL?E??F, RE?F??O, FG?L??E, SL?F??E, ER?N??E, EB?R??F, ED?E??N, DE?R??E, LR?E??E, LB?E??E, MS??B??L, OU??L??E, BD??F??L, EF??D??D, RE??E??A, FE??F??L, SE??E??O, FB??E??E, J?N?R?F, B?Q?R?B, E?Q?L?G, G?E?J?F, B?F?N?E, G?E?S?B, R?L?E?S, F?E?R?B, E?S?E?D, L?E?S?E, B?L?E?E, Q?R?B??N, S?E?V??q, Q?G?B??B, U?B?S??R, E?Q?L??T, B?U?F??L, B?D?R??F, F?D?U??E, S?E?D??F, G?E?J??E, O?B?L??E, L?E?S??S, B?L?E??F, B?R??F??O, Q?L??T??S, O?T??S??S, R?B??J??R, F?S??T??L, B?S??R??D, E?V??q??E, E?D??F??L, D?F??L??E, D?E??N??E, E?S??F??B, F?S??B??E, G?B??E??E, L?E??F??E, E?E??F??L, E?F??B??E, V??q??E??U, O??D??B??R, G??B??D??L, F??E??G??G, E??B??D??U, E??R??F??O, E??N??S??F, N??S??F??E, E??T??F??F, S??F??E??G, E??E??O??M, G??E??E??O, O??G??E??E, U??L??E??E, J??E??F??E, L??E??E??O

Letter contacts: http://zodiackillerciphers.com/images/fates-letter-contacts.png

Decoding this string will yield 90% of the plaintext: QNESXORFGVAUBEZLJTMDSy
Steve on December 21, 2012 at 12:33 pm said:

Not sure if this helps, but I count 28 symbols, in order of frequency they are

1 . 179 46
2 ¿ 123 191
3 ç 114 231
4 ¡ 106 161
5 • 100 8226
6 , 94 44
7 µ 90 181
8 * 71 42
9 – 64 45
10 ˇ 63 711
11 ’ 57 8217
12 ∆ 56 8710
13 ˚ 53 730
14 ˛ 47 731
15 ˘ 43 728
16 ¥ 40 165
17 38 32
18 º 35 186
19 π 34 960
20 Ω 29 937
21 q 26 113
22 Ô 24 212
23 ∞ 13 8734
24 y 13 121
25 # 11 35
26 ϕ 11 981
27 ^ 3 94
28 ? 2 63

Those numbers on the right are the unicode code points. If we take the whole thing to code points instead of the visual hash of the bizarre symbols, the whole thing looks like this (sorry, big dump of numbers coming up)

731 191 63 44 46 231 42 981 32 44 32 165 981 8217 161 711 8710 45 46 35 45 981 960 161 113 46 181 8226 8710 711 8226 46 44 42 981 32 44 730 231 191 161 731 161 231 730 8226 231 161 231 113 728 46 937 8217 42 981 161 212 46 231 186 45 46 191 730 161 8710 32 44 46 32 165 181 231 44 46 731 161 42 186 8226 161 186 8226 711 46 730 8217 981 46 711 161 960 46 231 8734 191 181 191 46 231 161 8710 45 46 121 161 44 711 231 42 8226 161 8226 730 46 191 161 8217 8226 731 8226 165 231 161 231 181 8226 191 44 191 46 231 46 8226 937 231 191 981 161 8217 46 711 730 46 186 212 113 46 191 32 960 728 161 45 42 730 8710 46 8226 42 46 8226 35 45 731 191 711 8217 231 46 165 161 711 191 161 32 960 191 181 161 191 46 231 8710 8217 44 937 46 44 730 161 8226 728 46 711 8226 8710 46 191 44 728 161 731 161 231 46 181 8226 165 46 8226 42 8734 212 161 186 45 8710 191 728 231 730 981 45 42 42 46 191 165 161 231 46 8226 46 231 161 44 46 191 960 711 8226 46 113 8217 711 161 728 8217 44 186 45 711 8710 46 8226 186 937 46 35 161 8226 181 42 728 46 730 165 191 231 46 45 8710 8217 161 731 32 960 728 46 231 8226 191 42 8226 44 46 231 46 212 113 231 44 191 161 231 45 8710 186 46 8226 728 8217 191 8217 8226 46 711 181 731 231 44 711 730 711 8710 191 161 42 121 44 8734 191 181 191 231 46 731 937 960 46 45 46 44 8226 161 8217 186 8226 46 231 8710 32 165 32 121 231 730 46 231 191 44 191 231 42 212 161 8217 8226 46 960 981 45 728 42 8226 231 46 8226 731 161 45 46 191 44 46 181 231 8226 937 161 191 46 186 45 181 711 8226 46 94 711 730 8734 191 32 960 42 231 8217 8710 8226 181 161 212 728 711 46 181 32 165 42 181 730 8710 46 191 161 937 8226 46 45 711 191 8226 46 44 32 960 231 728 186 46 45 186 42 8710 161 42 730 8226 113 212 161 8217 46 730 8217 35 191 731 42 711 46 728 32 960 231 711 8226 937 161 191 165 32 960 46 181 711 191 728 46 8710 186 730 181 231 730 8226 46 42 8217 8710 181 161 45 728 161 191 45 231 46 44 191 46 191 42 8710 46 44 8226 161 191 231 32 960 42 46 45 46 191 728 44 937 8217 937 46 231 35 711 181 8226 8217 730 181 212 161 191 731 42 165 161 8217 46 191 8710 231 32 960 42 46 8226 32 165 937 113 44 728 8734 46 191 191 45 42 181 45 42 165 161 8226 181 231 181 161 8226 46 44 731 46 191 181 8710 730 44 46 186 45 231 181 32 960 711 212 181 161 8217 181 46 8217 8226 981 8217 32 165 45 35 121 730 8710 181 186 45 161 711 191 181 8226 231 46 181 161 731 46 8226 728 960 161 212 44 181 161 730 730 181 8217 181 44 191 44 8710 45 731 46 937 231 8226 46 191 165 181 46 181 231 181 44 45 728 191 960 161 113 8226 46 212 8710 46 730 711 45 44 181 711 8217 181 231 8226 8217 191 181 45 728 161 711 161 181 730 46 960 8710 46 8226 8734 45 181 231 42 212 731 728 191 46 937 161 165 730 711 44 8226 121 46 44 8710 8226 161 731 45 161 42 44 191 960 32 44 44 181 46 731 231 8710 45 181 730 42 728 8226 46 8217 8226 46 711 8217 45 181 191 46 711 937 231 121 212 231 711 161 8226 728 161 730 42 8217 44 46 8710 186 191 231 8226 231 32 960 32 44 32 165 728 8226 191 46 8734 45 161 730 728 42 181 46 937 8226 46 191 45 731 113 231 44 161 44 8226 8710 731 191 728 45 191 46 8226 711 165 181 231 711 44 44 8217 42 231 8217 42 730 711 161 8710 32 960 32 44 42 212 191 8217 731 191 8226 46 181 121 161 44 231 8226 186 937 231 45 730 32 165 32 44 711 8226 191 8710 191 181 731 161 728 45 191 8226 44 161 42 46 32 960 8710 113 231 46 186 8226 44 186 42 191 35 161 731 46 44 181 730 46 191 231 113 731 231 711 44 231 8217 161 181 8226 46 191 8217 46 44 46 113 231 731 165 8710 231 711 728 42 8226 44 45 8734 8226 191 44 728 161 113 212 165 46 8710 181 731 8226 46 730 8217 191 44 45 8710 94 44 8226 937 191 42 231 181 960 231 186 8226 113 728 730 45 731 161 711 45 44 44 46 181 35 46 8217 42 711 121 186 731 981 32 44 212 711 231 44 728 161 8710 113 728 165 46 191 8217 231 731 191 44 8226 186 42 186 8226 937 191 181 730 44 44 231 711 731 46 113 161 191 8710 960 191 8710 181 46 44 231 165 191 45 161 191 46 728 46 231 731 8710 161 191 8226 161 191 231 121 45 46 191 186 42 46 191 42 231 8734 191 8710 8217 161 44 181 711 8226 46 8217 8226 44 46 231 191 45 191 711 186 42 231 42 161 730 46 8217 165 960 46 231 186 937 161 42 728 731 191 8710 46 35 46 44 8226 161 191 231 181 181 191 212 8710 8226 231 42 8710 45 181 186 42 113 181 730 46 45 46 191 937 32 44 231 181 731 191 231 711 165 8217 161 181 711 8226 161 711 8226 46 44 32 960 46 8217 231 165 42 8710 191 181 121 231 46 730 8217 181 161 191 44 45 231 186 42 186 181 46 8226 731 46 730 212 191 181 113 161 711 46 165 8226 728 46 42 44 46 191 45 730 44 181 960 161 42 46 191 8226 181 191 181 231 731 181 45 730 231 165 8734 181 231 46 8226 8217 161 711 44 46 45 46 8217 8710 46 937 113 8217 212 191 8226 730 161 191 960 46 35 44 46 165 46 191 731 161 44 231 45 728 42 121 181 42 46 191 8226 8710 46 711 44 191 730 231 186 45 728 161 731 44 231 191 165 45 42 231 161 181 46 44 937 8226 46 730 32 960 231 113 191 8226 711 231 191 212 46 711 161 8217 937 711 181 46 42 8710 46 231 8217 731 44 161 191 44 46 165 45 46 42 730 8217 8226 46 42 32 960 161 186 937 186 46 121 231 45 46 731 181 46 191 161 711 165 181 44 181 231 8226 44 46 728 46 44 46 731 161 191 8710 186 46 113 231 42 231 191 730 45 46 8226 42 46 191 44 728 186 231 44 231 161 8734 46 937 8217 46 711 728 711 161 191 212 8710 46 35 231 730 8226 181 231 937 181 231 731 191 181 181 32 165 711 42 231 191 32 960 42 46 42 8710 46 8226 191 181 44 231 45 8217 231 730 45 42 8226 161 45 937 960 32 44 191 165 161 231 8710 730 191 46 212 113 42 191 191 8734 46 8226 711 46 731 191 231 711 161 113 46 191 44 181 8226 8217 191 45 44 46 730 161 8217 45 161 42 186 8226 731 231 46 711 44 44 191 46 181 231 8710 937 960 46 121 165 181 231 730 42 161 731 191 44 165 44 8217 165 161 8710 231 181 45 231 8226 42 46 94 212 730 191 113 8226 937 46 731 44 728 181 231 165 711 181 45 711 8734 42 46 8217 32 960 8710 161 8217 42 730 113 728 8226 711 46 113 8710 165 181 161 728 46 42 165 212 191 181 731 8710 231 44 46 63 181 731 191 46 161

I can see what you mean, as it would make some of the ‘words’ quite long, but I quite like ’46’, your ‘E’, as a space.
nickpelling on December 21, 2012 at 12:44 pm said:

Steve: If ’46’ = ‘E’ is genuinely a plaintext space, then we still need another punctuation character to break up those long words: perhaps a hyphen or slash? Also, it’s still a relatively short piece of text so there’s no guarantee it’ll contain a full alphabet (i.e. including Q / J / Z etc).

Conversely, I’m wondering whether a few of the characters might simply be nulls, particularly the ’46’/’E’ character. Having said that, I did try stripping out all the ‘E’ characters and cryptanalysing what remained, but that also didn’t seem to get anywhere.

Alternatively, it may simply be more pragmatic to search for portentous-sounding semi-philosophical cribs, such as “THEANSWERTOTHEULTIMATEQUESTIONIS”… 🙂
nickpelling on December 21, 2012 at 12:58 pm said:

Dave: thanks for that! I did notice that many of the asymmetries in the letter contacts are to do with the letter B:-
(E,B) = 28, (B,E) = 13
(L,B) = 16, (B,L) = 7
(B,Q) = 7, (Q,B) = 2
(B,R) = 14, (R,B) = 5
(A,B) = 11, (B,A) = 3
I wonder whether this is telling us that ‘B’ enciphers a space, so what we’re looking at is word-initial and word-final distributions? Just a thought!
Nedwan on December 21, 2012 at 1:40 pm said:

Hi,

Copying the text from the original page to any text editor (on my computer) produces this as a ‘list’ of ‘words’ of various character length??

I’ve no idea if this is a formatting thing (not apparent on the screen due to content protection) or if it makes the text less confusing to view.

Any ideas?

——————————————————-
˛¿
?
,.ç*
ϕ
,
¥
ϕ
‘¡ˇ
∆
-.#-
ϕπ
¡q.
µ
•
∆
ˇ•.,*
ϕ
,˚ç¿¡˛¡ç˚•ç¡çq˘.
Ω
‘*
ϕ
¡Ô.çº-.¿˚¡
∆
,.
¥
µç,.˛¡*º•¡º•ˇ.˚’
ϕ
.ˇ¡
π
.ç
∞
¿µ¿.ç¡
∆
-.y¡,ˇç*•¡•˚.¿¡’•˛•¥ç¡çµ•¿,¿.ç.•
Ω
ç¿
ϕ
¡’.ˇ˚.ºÔq.¿
π
˘¡-*˚
∆
.•*.•#-˛¿ˇ’ç.¥¡ˇ¿¡
π
¿
µ¡¿.ç
∆
‘,
Ω
.,˚¡•˘.ˇ•
∆
.¿,˘¡˛¡ç.µ•¥.•*
∞
Ô¡º-
∆
¿˘ç˚
ϕ
-**.¿¥¡ç.•.ç¡,.¿
π
ˇ•.q’ˇ¡˘’,º-ˇ
∆
.•º
Ω
.#¡•µ*˘.˚¥¿ç.-
∆
‘¡˛
π
˘.ç•¿*•,.ç.Ôqç,¿¡ç-
∆
º.•˘’¿’•.ˇµ˛ç,ˇ˚ˇ
∆
¿¡*y,
∞
¿µ¿ç.˛
Ω
π
.-.,•¡’º•.ç
∆
¥
yç˚.ç¿,¿ç*Ô¡’•.
π
ϕ
-˘*•ç.•˛¡-.¿,.µç•
Ω
¡¿.º-µˇ•.^ˇ˚
∞
¿
π
*ç’
∆
•µ¡
Ô
˘ˇ.µ
¥
*µ˚
∆
.¿¡
Ω
•.-ˇ¿•.,
π
ç˘º.-º*
∆
¡*˚•qÔ¡’.˚’#¿˛*ˇ.˘
π
çˇ•
Ω
¡¿¥
π
.µˇ¿˘.
∆
º˚µç˚•.*’
∆
µ¡-˘¡¿-ç.,¿.¿*
∆
.,•¡¿ç
π
*.-.¿˘,
Ω
‘
Ω
.ç#ˇµ•’˚µÔ¡¿˛*¥¡’.¿
∆
ç
π
*.•
¥
Ω
q,˘
∞
.¿¿-*µ-*¥¡•µçµ¡•.,˛.¿µ
∆
˚,.º-çµ
π
ˇÔµ¡’µ.’•
ϕ
‘
¥
-#y˚
∆
µº-¡ˇ¿µ•ç.µ¡˛.•˘
π
¡Ô,µ¡˚˚µ’µ,¿,
∆
-˛.
Ω
ç•.¿¥µ.µçµ,-˘¿
π
¡q•.Ô
∆
.˚ˇ-,µˇ’µç•’¿µ-˘¡ˇ¡µ˚.
π
∆
.•
∞
-µç*
Ô
˛˘¿.
Ω
¡¥˚ˇ,•y.,
∆
•¡˛-¡*,¿
π
,,µ.˛ç
∆
-µ˚*˘•.’•.ˇ’-µ¿.ˇ
Ω
çyÔçˇ¡•˘¡˚*’,.
∆
º¿ç•ç
π
,
¥
˘•¿.
∞
-¡˚˘*µ.
Ω
•.¿-˛qç,¡,•
∆
˛¿˘-
¿
.•ˇ¥µçˇ,,’*ç’*˚ˇ¡
∆
π
,*Ô¿’˛¿•.µy¡,ç•º
Ω
ç-˚
¥
,ˇ•¿
∆
¿µ˛¡˘-¿•,¡*.
π∆
qç.º•,º*¿#¡˛.,µ˚.¿çq˛çˇ,ç’¡µ•.¿’.,.qç˛¥
∆
çˇ˘*•,-
∞
•¿,˘¡qÔ¥.
∆
µ˛•.˚’¿,-
∆
^,•
Ω
¿*çµ
π
çº•q˘˚-˛¡ˇ-,,.µ#.’*ˇyº˛
ϕ
,Ôˇç,˘¡
∆
q˘¥.¿’ç˛¿,•º*º•
Ω
¿µ˚,,çˇ˛.q¡¿
∆
π
¿
∆
µ.,ç¥¿-¡¿.˘.ç˛
∆
¡¿•¡¿çy-.¿º*.¿*ç
∞
¿
∆
‘¡,µˇ•.’•,.ç¿-¿ˇº*ç*¡˚.’¥
π
.çº
Ω
¡*˘˛¿
∆
.#.,•¡¿çµµ¿Ô
∆
•ç*
∆
-µº*qµ˚.-.¿
Ω
,çµ˛¿çˇ¥’¡µˇ•¡ˇ•.,
π
.’ç¥*
∆
¿µyç.˚’µ¡¿,-çº*ºµ.•˛.˚Ô¿µq¡ˇ.¥•˘.*,.¿-˚,µ
π
¡*.¿•µ¿µç˛µ-˚ç¥
∞
µç.•’
¡
ˇ,.-.’
∆
.
Ω
q’Ô¿•˚¡¿
π
.#,.¥.¿˛¡,ç-˘*yµ*.¿•
∆
.ˇ,¿˚çº-˘¡˛,ç¿¥-*ç¡µ.,
Ω
•.˚
π
çq¿•ˇç¿
Ô
.ˇ¡’
Ω
ˇµ.*
∆
.ç’˛,¡¿,.¥-.*˚’•.*
π
¡º
Ω
º.yç-.˛µ.¿¡ˇ¥µ,µç•,.˘.,.˛¡¿
∆
º.qç*ç¿˚-.
•
*.¿,˘ºç,ç¡
∞
.
Ω
‘.ˇ˘ˇ¡¿Ô
∆
.#ç˚•µç
Ω
µç˛¿µµ
¥
ˇ*ç¿
π
*.*
∆
.•¿µ,ç-‘ç˚-*•¡-
Ω
π
,¿¥¡ç
∆
˚
¿
.Ôq*¿¿
∞
.•ˇ.˛¿çˇ¡q.¿,µ•’¿-,.˚¡’-¡*º•˛ç.ˇ,,¿.µç
∆Ωπ
.y¥µç˚*¡˛¿,¥,’¥¡
∆
çµ-ç•*.
^
Ô˚¿q•
Ω
.˛,˘µç¥ˇµ-ˇ
∞
*.’
π∆
¡’*˚q˘•ˇ.q
∆
¥µ¡˘
.
*¥
Ô
¿µ˛
∆
ç,.
?
µ˛¿.¡
Dave on December 21, 2012 at 5:23 pm said:

I did some analysis of the boldfaced symbols. Check this out:

https://www.evernote.com/shard/s1/sh/82665f6c-9f4b-4761-ba50-67c080977d11/705cc72ba98344947dc5e376bc75cdd8

I’m not sure what the boldfacing means. What do you make of it?

Next, let’s look at word separators.
Let x be the number of word separators.
Assume some symbol serves as word separator. Thus, there are x+1 words.
Assume average word length of 5 characters, and total cipher text length of 1517.

Then, (1517 – x) / 5 = x + 1
This gives us an expectation estimate of 252 word separators. The most frequent symbol, a period, occurs 193 times in the cipher text, so this could very well be a word separator.

Now, here’s what the cipher text looks like with periods interpreted as word separators:

https://www.evernote.com/shard/s1/sh/636cf1fb-0a73-4080-901b-90bb17a377dd/8143b21de4bf89cb6f78ebc4ba0c8969. Looks interesting, but the list of repeating “words” consists only of one- and two-character sequences. You’d expect a few longer common words in there, like “THE”.
nickpelling on December 21, 2012 at 5:28 pm said:

Nedwan: I had a similar thing when I cut and pasted the text, it seemed like some kind of PDF-Unicode-clipboard-character bug, so I just ignored it. Which may have been a mistake but… that’s what I did. I did notice that certain characters (such as the delta/triangle) often came out on a line of their own, not sure what’s going on, sorry!
nickpelling on December 21, 2012 at 5:37 pm said:

Dave: my guess is that the boldfacing may indicate the start of a sentence, or perhaps may be just a way of enciphering capitalization. The peaky frequency distribution is something I wasn’t expecting, though. 🙂
Dave on December 21, 2012 at 5:39 pm said:

Also, here’s a visualization of where the repeating 5-symbol sequences occur:

https://www.evernote.com/shard/s1/sh/87b03f9f-a164-4884-82b7-92bf3d6d5ad6/47edfc1202ed5bd06b893edf17277b0a

It’s interesting to me that they seem to be entirely made with punctuation.
Dave on December 21, 2012 at 8:36 pm said:

Oops, it was actually a repeating 6-symbol sequence. Here is the updated visualization:

https://www.evernote.com/shard/s1/sh/87b03f9f-a164-4884-82b7-92bf3d6d5ad6/47edfc1202ed5bd06b893edf17277b0a
Robert Dallison on December 22, 2012 at 8:57 pm said:

@nickpelling I used MS-Word’s “paste special” with the “unformatted unicode text” option. Two or three re-reads seem to confirm that the characters come across cleanly.

@Steve, I don’t understand where your ASCII 32 characters come from (index 17 in your list). I think you may have a copy/paste artefact or an auto-spacing feature that is introducing spaces, e.g. if you are using MS-Word. I agree with you on all the other characters (27 of them, no spaces) and frequency counts.

@Dave I get a total character count of 1,501 (not including CR/LF characters), which factorises to 19 x 79. This makes me wonder if we are looking at a long-keyword Vigenere. However, the IC is then higher than we would expect from Vigenere encryption.

Based on the rest of the document, I think a good avenue for exploration would be that the plaintext includes numerals and/or algebraic symbols, as seen frequently throughout the treatise. This would explain the 27 character count and the low IC (low for substitution, that is).

Bear in mind that substitution may have been combined with transposition (once again it is the 19 x 79 factorisation that makes me introduce this).
Robert Dallison on December 22, 2012 at 9:08 pm said:

Here is my formatted CT. I have replaced characters sequentially as found in the original text, using A-H, J-Z and 1-2. Following @nickpelling I have not used I (capital i). However, I have replaced CT ‘q’ by ‘R’ and CT ‘y’ by ‘1’. Linebreaks are preserved but not the bold characters.

ABCDEFGHDJHKLMNOEPOHQLRESTNMTEDGHDUFBLALFUTFLFRVEWKGHLXEFYOEBULNDEJSFDEA
LGYTLYTMEUKHEMLQEFZBSBEFLNOE1LDMFGTLTUEBLKTATJFLFSTBDBEFETWFBHLKEMUEYXREBQV
LOGUNETGETPOABMKFEJLMBLQBSLBEFNKDWEDULTVEMTNEBDVLALFESTJETGZXLYONBVFUHOGGE
BJLFETEFLDEBQMTERKMLVKDYOMNETYWEPLTSGVEUJBFEONKLAQVEFTBGTDEFEXRFDBLFO
NYETVKBKTEMSAFDMUMNBLG1DZBSBFEAWQEOEDTLKYTEFNJ1FUEFBDBFGXLKTEQHOVGTFETALOE
BDESFTWLBEYOSMTE2MUZBQGFKNTSLXVMESJGSUNEBLWTEOMBTEDQFVYEOYGNLGUTRX
LKEUKPBAGMEVQFMTWLBJQESMBVENYUSFUTEGKNSLOVLBOFEDBEBGNEDTLBFQGEOE
BVDWKWEFPMSTKUSXLBAGJLKEBNFQGETJWRDVZEBBOGSOGJLTSFSLTEDAEBSNUDEYO
FSQMXSLKSEKTHKJOP1UNSYOLMBSTFESLAETVQLXDSLUUSKSDBDNOAEWFTEBJSESFSDOVBQ
LRTEXNEUMODSMKSFTKBSOVLMLSUEQNETZOSFGXAVBEWLJUMDT1EDNTLAOLGDBQDDSEAFNO
SUGVTEKTEMKOSBEMWF1XFMLTVLUGKDENYBFTFQDJVTBEZOLUVGSEWTEBOARFDLDTNABVO
BETMJSFMDDKGFKGUMLNQDGXBKABTES1LDFTYWFOUJDMTBNBSALVOBTDLGEQNRFEYTDYGBP
LAEDSUEBFRAFMDFKLSTEBKEDERFAJNFMVGTDOZTBDVLRXJENSATEUKBDON2DTW
BGFSQFYTRVUOALMODDESPEKGM1YAHDXMFDVLNRVJEBKFABDTYGYTWBSUDDFMAERLBNQ
BNSEDFJBOLBEVEFANLBTLBF1OEBYGEBGFZBNKLDSMTEKTDEFBOBMYGFGLUEKJQEFYWLGVA
BNEPEDTLBFSSBXNTFGNOSYGRSUEOEBWDFSABFMJKLSMTLMTEDQEKFJGNBS1FEUKSLBDO
FYGYSETAEUXBSRLMEJTVEGDEBOUDSQLGEBTSBSFASOUFJZSFETKLMDEOEKNEWRKXBTULBQEPDE
JEBALDFOVG1SGEBTNEMDBUFYOVLADFBJOGFLSEDWTEUQFRBTMFBXEMLKWMSEGNEFKADLBDE
JOEGUKTEGQLYWYE1FOEASEBLMJSDSFTDEVEDEALBNYERFGFBUOETGEBDVYFDFLZEWKEMVM
LBXNEPFUTSFWSFABSSJMGFBQGEGNETBSDFOKFUOGTLOWQDBJLFNUBEXRGBBZETMEA
BFMLREBDSTKBODEULKOLGYTAFEMDDBESFNWQE1JSFUGLABDJDKJLNFSOFTGE2XUBRTWEADVSF
JMSOMZGEKQNLKGURVTMERNJSLVEGJXBSANFDECSABEL
Dave on December 22, 2012 at 11:16 pm said:

I tried to post a lengthy comment and it got swallowed by the server.

Here’s a visualization of some interesting patterns in the cipher text:

http://zodiackillerciphers.com/images/fates-unwind-original-patterns.png

The repeated 6-gram is marked in grey. Blue indicates several palindromes. And 5-grams whose reversals occur on the same line are marked in red.
Dave on December 22, 2012 at 11:22 pm said:

I ran some experiments using zkdecrypto-lite:

https://www.evernote.com/shard/s1/sh/34074a0d-b1d5-437e-924d-ff9fa1b12015/25fdb81e5aee1036d5a0b46c1d45d5b2
Dave on December 23, 2012 at 1:10 pm said:

Another quick experiment:

https://www.evernote.com/shard/s1/sh/aff62112-cc5b-4f2e-993b-9e779d6ff424/0be39106245d115ef054edf541b4d782

The idea was to remove each symbol then test ioc, chi squared, and entropy.

Results:
* Removing the period (the most common symbol) resulted in the largest difference between ioc and English.
* 2nd largest difference occurs when leaving all symbols intact.
* The top 3 symbols that, when removed, resulted in the closest ioc to English were: μ, comma, and •. Perhaps one of them represents full stop.
* Out of curiosity, I also tested all possible pairs of symbols, removing them and computing ioc differences. Removing comma and μ together results in the smallest difference from English ioc.
nickpelling on December 23, 2012 at 3:31 pm said:

Dave: good test – but how do the IoC stats compare for the IoC stats for English without spaces, English without punctuation, and English without spaces or punctuation? This might prove revealing… 🙂
Dave on December 23, 2012 at 7:57 pm said:

OK, here’s what IoC looks like for “Tale of Two Cities”:

https://www.evernote.com/shard/s1/sh/07599507-e1e0-46e8-8ac6-f371ef37db4c/e96e36fb9716ba53ca8d605f12967710

The Fates Unwind code’s IoC is shown there too, as well as what its IoC looks like with each symbol removed.

Seems to line up with the “one symbol is used as punctuation” hypothesis.

The various Dickensian input files are here: http://zodiackillerciphers.com/fates/tale/.
Dave on December 24, 2012 at 4:08 pm said:

Here is another experiment: I took the first 1501 characters of Tale of Two Cities and removed spaces, and all punctuation except for full stop. Then I counted its unique n-grams and compared them to the n-grams in the Fates code.

The result is that the Fates code appears to have far more unique n-grams (thus, fewer repeats) than expected from expected English plaintext.

Tale of two cities has:
* 227 repeating among 1158 unique 4-grams,
* 159 repeating among 1279 unique 5-grams,
* 134 repeating among 1326 unique 6-grams.

Fates Unwind code has:
* 23 repeating among 1473 unique 4-grams,
* 2 repeating among 1495 unique 5-grams,
* 1 repeating among 1495 unique 6-grams

The drastic drop-off in repeating higher-order n-grams is possibly more evidence that this isn’t simple substitution.
nickpelling on December 24, 2012 at 5:59 pm said:

Dave: I think we’d both worked that out from the letter contact table. And yet it doesn’t obviously look anything like a Vigenere or Alberti polyalpha.

This is why I hate trying to break home-grown ciphers: they’ve rolled out an idea that’s smart enough to confound your efforts, yet one that’s probably not hugely strong. So here’s my prediction from what I’ve seen so far: it’s some kind of trick cipher, where (say) vowels and consonants shift around independently, leaving some kind of residual structure intact while hollowing out the stats.
Dave on December 24, 2012 at 10:13 pm said:

Well, a simple rail fence has a similar hollowing out effect on the ngram counts of another Tale of Two Cities sample:

rails 1, 87 repeats, 1307 nonrepeats (equivalent to unchanged plaintext)
rails 2, 19 repeats, 1459 nonrepeats.
rails 3, 1 repeats, 1495 nonrepeats.
rails 4, 9 repeats, 1479 nonrepeats.
rails 5, 0 repeats, 1497 nonrepeats.
rails 6, 2 repeats, 1493 nonrepeats.
rails 7, 1 repeats, 1495 nonrepeats.
rails 8, 2 repeats, 1493 nonrepeats.
rails 9, 1 repeats, 1495 nonrepeats.

Note the sudden drop from 87 unique 5-grams.

So maybe some simple traditional combination of transposition + substitution are happening here.
Moshe Rubin on December 26, 2012 at 7:19 am said:

A fruitful approach to cryptanalyzing unknown ciphers is to search for isomorphs. Two strings are isomorphic to each other if they can be transformed from one to the other using a monoalphabetic substitution. Not all cipher systems generate significant isomorphs, but when a system does, significant statistically isomorphs indicate that the underlying plaintexts are the same. An isomorph-producing cipher system will often “fall apart at the seams” if enough significant isomorphs can be found and utilized.

A partial list of cipher systems that produce significant isomorphs includes polyalphabetics that use periodic or quasi-periodic keying sequences (e.g. autokey, progressive), Wheatstone Cryptograph, rotor machines (e.g., plugboard-less Enigma, Hebern), Kryha, the Japanese RED machine).

A good introduction to the use of isomorphs in cryptanalysis is William F. Friedman’s Military Cryptanalysis (Part III) (section III paragraph 12 and onwards) which can be downloaded from NSA’s web site:

http://www.nsa.gov/public_info/_files/military_cryptanalysis/mil_crypt_iii.pdf

Another good text is LANAKI’s Classical Cryptography Course, Lecture 13: Aperiodic Systems. Here’s a quote from the lecture:

Isomorphism is not restricted to cases where secondary alphabets are derived from a primary component sliding against the normal. It is useful in all cases of interrelated alphabets no matter what the basis of their derivation may be. It is second only to the importance of the “Probable Word” method which has nearly universal applicability.

The “Fates Unwind” message has two 16-character isomorphic sequences at offsets 552 and 1113. In Pelling’s ciphertext (with ‘q’ changed to ‘1’ and ‘y’ changed to ‘2’) the isomorphs are:

552: RNTDEWOFRQMVRLKR 1113: ETVBR1LMEJSUEGDE

In Dallison’s ciphertext the isomorphs are:

552: SNUDEYOFSQMXSLKS 1113: EUXBSRLMEJTVEGDE

Notice the four (R,E) pairs (or in Dallison’s ciphertext, the (S,E) pairs). Although I have not calculated the probability of such an occurrence in a 1,501 character message, I believe the isomorphs are statistically significant.

In either case the two sequences are related to each other via a monoalphabetic substitution. In Pelling’s ciphertext, for example, the monoalphabetic substitution is:

R->E / N->T / T->V / D->B / E->R / W->1 / O->L / F->M / Q->J / M->S / V->U / L->G / K->D

In a more succinct, mathematically form, we can write the substitution as:

(RE) (NT) (TV) (DB) (ER) (W1) (OL) (FM) (QJ) (MS) (VU) (LG) (KD)

Chaining these transformations together gives us:

(RE) (NTVU) (KDB) (W1) (OLG) (FMS) (QJ)

Can someone make use of these isomorphs?
Dave on December 27, 2012 at 8:39 pm said:

Moshe, thanks for that very useful and interesting suggestion. I went ahead and did a quick computation to pull out every isomorph from the cipher text. Results are here:

http://zodiackillerciphers.com/fates-isomorphs.html

Maybe these will be useful for concentrated attacks to test for the various cryptographic systems you’ve mentioned.

I plan to run this quick experiment: Generate 1501-letter strings at random, using a similar frequency distribution, and count how many times 16-letter isomorphs are created simply by chance.
Dave on December 28, 2012 at 11:35 am said:

I ran the random trials and 1.74% of 10,000 randomly generated strings had isomorphic strings of length 16 or longer, with number of repeated symbols greater than or equal to 4. Results showing the “hits”:

http://zodiackillerciphers.com/fates-isomorph-random-trials.html

The alphabet used consists of “a” through “z”, and the extra symbol “{“. The symbols I drew randomly using a distribution that resembles the Fates Unwind Infinity code.

So, the 16-letter isomorph might just be a chance occurrence.
Moshe Rubin on December 28, 2012 at 1:20 pm said:

Dave: Kudos on putting together an isomorph searcher program in so short a time! I believe your results are perfectly accurate (the only comment is that “13, [SNUDEYOFSQMXS 552 S] [EUXBSRLMEJTVE 1113 E]” is a sub-set of “16, [SNUDEYOFSQMXSLKS 552 S] [EUXBSRLMEJTVEGDE 1113 E]”). Your generating 10,000 1,501-character strings and searching for isomorphs is certainly the way to test the theory.

I have two probability/statistical related questions I hope you or others can answer:

(1) If isomorphic pairs occurred randomly, as you found, in only 1.74% of the cases, does this mean that finding a 16-character isomorph in the Fates-Unwind message is therefore significant? Doesn’t a ratio below 50% indicate more of a probability of its being significant than not? What would the percentage need to be to convince us that ours is significant? 1%? 0.5%? 0.001%?

(2) Your test found that 16-character isomorphs occurred in 1.74% of the test cases. The isomorphs you looked for, however, are different from the specific ordered one we found (i.e., “16, [SNUDEYOFSQMXSLKS 552 S] [EUXBSRLMEJTVEGDE 1113 E]”). I realize that the probability of the string “ABBC” is the same as “AABC”, but shouldn’t your test be looking for the specific isomorphic pattern [0,1,2,3,4,5,6,7,0,8,9,10,0,11,12,0] in the randomly generated texts?

My probabilistic calculations show that given a 27-character alphabet and a 16-character sequence containing only a 4-character repeat where order is not important there are 227,324,281,389,004,800,000 matching isomorphs.

The number of isomorphs that can match [0,1,2,3,4,5,6,7,0,8,9,10,0,11,12,0] specifically is 124,903,451,312,640,000 (i.e. 27! / 14!). So the ratio of ordered isomorphs to non-ordered ones is 1:1820.

I’m most interested to hear your take on solving problems like these.
Robert Dallison on December 28, 2012 at 8:09 pm said:

Mouthwatering stuff from Moshe and Dave! Looking forward to getting my teeth into this after the holidays. My gut feeling at this point is that we are looking at a substitution + transposition but happy to be proved wrong. Regarding Moshe’s questions about significance, would it help to encrypt a sample of the same writer’s plaintext using a variety of encryption systems (eg all those mentioned in the thread here) and a variety of encryption keys, then run isomorph frequency analysis on the resulting cipher text? I don’t have my kit with me but I’m sure this shouldn’t be hard for people here to automate… Maybe a candidate will pop out as a result.
Dave on December 29, 2012 at 3:11 am said:

I ran a simple rail fence transposition test on the cipher text, looking for configurations that yield increased counts of repeated n-grams.

The unmodified cipher text has 23 repeated 4-grams, 2 repeated 5-grams, 1 repeated 6-gram, and no repeated 7-grams. For comparison, this excerpt of Tale of Two Cities has 160 repeated 4-grams, 87 repeated 5-grams, 46 repeated 6-grams, and 28 repeated 7-grams.

Here are the results:

Rail fence 4-grams
Rail fence 5-grams
Rail fence 6-grams
Rail fence 7-grams

Some configurations produce greatly improved repeated n-gram counts, but they seem not yet as good as regular English plaintext (at least, as good as the Dickensian English sample). Still, this seems like a promising sign that a simple transposition scheme is at work.
Dave on December 29, 2012 at 3:38 am said:

I also tried all possible columnar transpositions for keyword lengths 2 through 11 (I stopped the experiment during 12 because it was taking forever).

Results:

Sorted by repeated 4-gram counts
Sorted by repeated 5-gram counts
Sorted by repeated 6-gram counts

Same story: Modest improvements to repeated n-gram counts. Doesn’t feel definitive. But it’s a possible sign that some kind of transposition will greatly improve the n-gram statistics.
Dave on December 29, 2012 at 4:18 am said:

Moshe,

(1) My personal belief is that I can’t make confident conclusions about the significance of patterns unless the odds are astronomically low that they were formed by chance. A value as high as 1.74% introduces too much doubt.

Here’s an example: Take a sample of text from the comments here, and arrange it in a 40×40 grid, like a word search. Then look for coincidental words that appear diagonally, vertically, and backwards. Once you find one, calculate the odds that a word of that length can be found in arbitrary text. If the odds were only 1.74%, it would not be good evidence of significance. In fact, you may find an extremely rare event indeed, and it’d still be insignificant because it arose by accident from the construction of the grid.

Some of my favorite examples of patterns that look very interesting but turn out to be completely random are the “Jazzerman pairs” and “repeated structure patterns” mentioned in this article about the unsolved Zodiac ciphers.

Nevertheless, there may be other qualities of the Fates cipher text that end up validating some scheme implied by the presence of isomorphs.

(2) The problem with looking for a specific isomorphic pattern is that it assumes we wouldn’t be interested in the other patterns. For instance, if we had found a different arrangement of the 16-symbol pattern in the Fates cipher text, we would have considered it to be as significant as the other one. So, it seems only fair to me to look for ALL patterns that would be as interesting as the one we found.

I like Dallison’s suggestion of experimenting with applying the various mentioned cipher systems to see what happens.
Dave on December 29, 2012 at 1:15 pm said:

Here’s some more statistics and tests, generated by the useful CryptoCrack tool.
Moshe Rubin on December 30, 2012 at 6:26 am said:

My approach to evaluating a pair of n-character isomorphs with a given pattern is to compute the expected number of such pairs in the given ciphertext. My isomorph search and subsequent calculations resulted in the following “best” isomorphs:

“Pruned” Isomorphs
+---+--------+--------+-------------------+ |LEN| BEGIN |EXPECTED| ISOMORPHS | +---+--------+--------+-------------------+ | 16|553 1114| 1.6930 | RNTDEWOFRQMVRLKR | | | | | ETVBR1LMEJSUEGDE | +---+--------+--------+-------------------+
“Unpruned” Isomorphs
+---+--------+--------+-------------------+ |LEN| BEGIN |EXPECTED| ISOMORPHS | +---+--------+--------+-------------------+ | 13|972 1311| 8.2342 | DFJBOLBEUEFAN | | | | | FLXEZKEMUMLBV | +---+--------+--------+-------------------+ | 17|552 1113| 0.8755 | BRNTDEWOFRQMVRLKR | | | | | AETVBR1LMEJSUEGDE | +---+--------+--------+-------------------+ | 13|171 849| 8.2342 | QBRLBEFNKDZED | | | | | AFMDFKLRSEBKE | +---+--------+--------+-------------------+ | 17| 21 712| 0.8755 | QL1ERSNMSEDGHDTFB | | | | | RBEMZF2VFMLSULTGK | +---+--------+--------+-------------------+ | 16| 23 714| 1.6930 | 1ERSNMSEDGHDTFBL | | | | | EMZF2VFMLSULTGKD | +---+--------+--------+-------------------+ | 16| 24 715| 1.6930 | ERSNMSEDGHDTFBLA | | | | | MZF2VFMLSULTGKDE | +---+--------+--------+-------------------+
Determining where an isomorph begins and ends can be difficult if the characters to the right / left are singletons. This is because, in those cases, there is no supporting evidence to suppose that the singletons can or should be added off the ends. I have therefore differentiated between “pruned” and “unpruned” isomorphs. A “pruned” isomorph is one where the left- and right-most characters of the isomorph are not singletons, but rather occur again somewhere in the isomorph. An “unpruned” isomorph, on the other hand, is a maximal-length one which has singletons off the left and/or right sides for as far as the isomorph relationship exists.

Looking above, we find the 16-character pruned isomorph pair RNTDEWOFRQMVRLKR and ETVBR1LMEJSUEGDE, beginning at 1-based position 553 and 1114, respectively. My calculations show that, in a 1501-chacter ciphertext, we would expect 1.693 such isomorphic pairs by chance. Given this result, unless we have other supporting reasons to believe otherwise, this isomorph must be assumed to be non-causal and non-significant.

There are six (6) unpruned isomorphs, as can be seen above. The two 17-character unpruned isomorphs have an expected value of 0.8755 by chance. Although better than the previous 1.693, it is greater than 0.5, meaning we expect > 0.5 such pairs by chance. Here again we should assume, unless there is other evidence, that the isomorphs found here are non-causal and insignificant.

Any value below 0.5 expected pairs would mean it is more likely to be significant than non-significant. As the expected value drops further below 0.5, our confidence that the isomorph is significant increases.

The bottom line is, as Dave has pointed out, that the isomorphs found here should be assumed to be non-significant.
Dave on December 30, 2012 at 10:26 am said:

Moshe,

Can you describe how you computed the expected values? 1.693 seems high compared to observations in the random cipher text experiment.
Robert Dallison on December 30, 2012 at 10:48 am said:

@dave I like your Dec22 post with the zkdecrypto attacks. Looks like PT is not too far away. I still haven’t had time to work on this (most frustrating), but my money is on a 19-column transposition applied to one of your candidates in that zkdecrypto post.
Dave on December 30, 2012 at 7:34 pm said:

Robert,

First thing I noticed after reading your comment was that “Fates Unwind Infinity” is 19 characters long (with spaces removed). I tried to decode the cipher text using that key phrase, but the repeated 4-gram counts worsen:

Results

The “Tale of Two Cities” excerpt is shown for comparison. The idea is that if we undo the transposition successfully, then the repeated n-gram counts should be much larger, since the simple substitution by itself doesn’t affect those.

I then tried to encode the “Tale” excerpt using the same 19 character key. Then a very nice coincidence occurred: The columnar transposition of “Tale” contains 23 unique repeated 4-grams, the exact same number that can be found in the “Fates” cipher text. Weird coincidence, eh? At a minimum, it shows that columnar transposition significantly reduces the number of longer repeated patterns in the cipher text.

Perhaps someone could double check my work to ensure I did the transpositions properly.
Moshe Rubin on December 31, 2012 at 4:26 pm said:

Dave,

Mathematical Explanation

My computations can be explained as follows:

Given:

(1) A message of M characters selected from a language of L characters
(2) A pair of n-character isomorphs in M consisting of a specific pattern (let’s call the isomorphs S0 and S1)
(3) A vector ‘a’ denoting n-tuple counts of one of the isomorphs. A pattern has a(0) 0-tuples (i.e., characters that do not appear in the isomorph), a(1) 1-tuples (singletons), a(2) 2-tuples, …, a(k) k-tuples

Then:

(1) The total number of possible n-character sequences is L^n (“L to the power of n”)

(2) The total number of n-character sequences isomorphic to S0 (or S1) over L is:
A = L!/a(0)!
(3) The probability of two random n-character strings over L being isomorphic to S0 (or S1) is:
B = A / L^n
(4) The number of possible comparisons of distinct n-character sequences in the message is:
C = ((M-n+1)(M-n+2))/2
(5) The number of expected pairs of n-character strings isomorphic to S0 (or S1) is:
Expected = B * C
Calculations

In our case we have the following:
S0 = RNTDEWOFRQMVRLKR S1 = ETVBR1LMEJSUEGDE M = 1501 L = 27 n = 16 a(0)= 14 a(1) = 12 a(2) = 0 a)3) = 0 a(4) = 1
The calculations are:
L^n = 27^16 = 79766443076872509863361 A = 27! / 14! = 124903451312640000 B = A / L^n = 124903451312640000 / 79766443076872509863361 = 1.5658646229501302487259969605112e-6 C = ((1501-16+1)(1501-16+2))/2 = (1486 * 1487) / 2 = 1104841 Expected = B * C = 1.5658646229501302487259969605112e-6 * 1104841 = 1.7300314358848448541326792078482
The result of 1.73+ is remarkably similar to your findings of 1.74%, but I cannot think of a logical connection between them. Any ideas?

Search Results

Pruned Isomorphs:
+---+--------+-------------------+--------+ |LEN| BEGIN | ISOMORPHS |EXPECTED| +---+--------+-------------------+--------+ | 8|279 478| DEFEV1FD | 1.6477 | | | | DBEBGNED | | +---+--------+-------------------+--------+ | 16|553 1114| RNTDEWOFRQMVRLKR | 1.6930 | | | | ETVBR1LMEJSUEGDE | | +---+--------+-------------------+--------+
Unpruned Isomorphs:
+---+--------+-------------------+--------+ |LEN| BEGIN | ISOMORPHS |EXPECTED| +---+--------+-------------------+--------+ | 17| 21 712| QL1ERSNMSEDGHDTFB | 0.0625 | | | | RBEMZF2VFMLSULTGK | | +---+--------+-------------------+--------+ | 16| 23 714| 1ERSNMSEDGHDTFBL | 0.1129 | | | | EMZF2VFMLSULTGKD | | +---+--------+-------------------+--------+ | 17|552 1113| BRNTDEWOFRQMVRLKR | 0.8755 | | | | AETVBR1LMEJSUEGDE | | +---+--------+-------------------+--------+ | 17|573 1360| KJOP2TNRWOLMBRSFE | 0.8755 | | | | GSLOZQDBJLFNTBEV1 | | +---+--------+-------------------+--------+ | 10|279 478| DEFEV1FDBL | 1.1375 | | | | DBEBGNEDSL | | +---+--------+-------------------+--------+ | 16| 24 715| ERSNMSEDGHDTFBLA | 1.6930 | | | | MZF2VFMLSULTGKDE | | +---+--------+-------------------+--------+ | 16|574 1361| JOP2TNRWOLMBRSFE | 1.6930 | | | | SLOZQDBJLFNTBEV1 | | +---+--------+-------------------+--------+
The results above are from a software program I wrote. The program’s accuracy is not as good as Windows’s Calculator application, which might explain the discrepancy between the expected value calculated above.

The differences between some of the search results above and those posted a few days ago are due to fixing a minor bug.

I hope this explains how I arrived at my calculations.
Moshe Rubin on December 31, 2012 at 5:20 pm said:

In my previous post I forgot to point out the tantalizing 17-character isomorphs QL1ERSNMSEDGHDTFB and RBEMZF2VFMLSULTGK (at 1-based positions 21 and 712) with a very low expected value of 0.0625. If my calculations are mathematically sound, this would indicate that this particular pair of isomorphs may be highly significant.

As an aside, the reason for the discrepancies between the previous calculations done by manually using Windows Calculator versus the results of my software program is because the latter uses logarithms for all internal calculations (e.g., factorials, exponentiation) which probably results in cumulative round-off errors. In the end the different results are close enough for comfort.
nickpelling on December 31, 2012 at 5:42 pm said:

Moshe: for me, the challenging thing about the 17-character isomorph pair you noticed is that each of the two has so few repeated letters – E, S and D in the first and M, F and L in the second. That is, these are 17 letter stretches containing 14 unique letters, something that sits awkwardly with most cipher system hypotheses.

To be honest, right now I’m wondering whether E is simply a null added to misdirect us.
Dave on January 1, 2013 at 1:08 pm said:

Moshe, thanks for taking the time to post the detailed mathematical explanation. It may take me a while to fully digest it.

Meanwhile, to explore Robert’s suspicion of a period 19 columnar transposition, I processed the author’s plain text, removing all non-alphabetic symbols, and then extracted all possible 19-character substrings to form numerical columnar keys. I tried these variations of key formation:

1) When duplicates exist in the key phrase, number the dupes in ascending order.
2) When duplicates exist in the key phrase, number the dupes in descending order.
3) Create 19-character keys that have no duplicates.

The search looked for counts of repeated 4-grams in the resulting decoded texts, but failed to find any with significantly increased counts. The best one had a count of 33, which is not much bigger than that of the original cipher text.

I also ran an exhaustive brute force search of all possible keys of periods 2 through 11 with no luck.

Now I’m trying corpora-generated key phrases of lengths greater than 19.
Micke on January 11, 2013 at 10:26 am said:

Your initial guess of character coding sounds sensible to me. UTF-8 and UTF-16 look terribly confusing if not decoded right. Given the author etc I would imagine this is some simple “code this text in UTF-8 and print it” exercise. If I am right then even the way it is printed in the original may not adhere to what the writer intended. Many characters are “unprintables” and appear as a space or, if you put into a unix editor, like “?” or so. For example if I paste the original into a texteditor or inte the unix editor “vi” I get two different versions. My guess is to stick to the bigramstatistics and treat each as one letter and go from there.
Robert Dallison on January 13, 2013 at 11:55 pm said:

@Micke, I’m a little bemused by your comment:

‘Given the author etc I would imagine this is some simple “code this text in UTF-8 and print it” exercise.’

That seems a little patronising. What do you know about the author that we don’t?
For my part, I assume that because this author is capable of some fairly involved philosophical and mathematical arguments, then he or she is also capable of wielding a structured and legitimate cryptographic algorithm to good effect.
For what it’s worth – and this is a departure from my previous posts – I now wonder if the underlying encryption scheme is not a simple XOR with some mathematically identifiable binary sequence, followed by a redistribution into 16-bit blocks. For example, the binary representation of the Sierpinski Sieve which is displayed on the frontispiece of the document, or the binary expansion of pi which is mentioned on multiple occasions in the text..
I have about 2 minutes per week to devote to this enigma, which is more frustrating than I can possibly express. Readers of these comments are invited to take my ideas and run with them, if you crack it just be sure to let me know 🙂
Sean Riddle on February 4, 2013 at 1:14 pm said:

I got frustrated trying to copy the symbols from the link (I guess scribd does that to make you pay them for the download?), so I took a look at the source for the web page. At the top there are some search engine tags, including “Available free for Kindle here: http://www.filedropper.com/thefatesunwindinfinity_1
And for iPad and other devices here: http://www.filedropper.com/thefatesunwindinfinity“. The second link is to an epub file, which is a normal ZIP file. I decompressed it and one resulting file is named “chapter 2 8.xhtml”. Opening this file reveals the symbols in an easy-to-use block, as well as a more-obvious indication of which are bolded (some bolded characters were hard to spot in the scribd file, especially the dot). It also turns out that the two right-side-up question marks are in a different font, but I’m not sure if that’s relevant. One other important thing- it looks like there is a chapter name, which is missing from the scribd file. It’s 2 Chinese characters; Google translated them as “password”. They are in bold, so I’m wondering if that’s a clue that the 42 bolded characters in the message are the password. Also, there are 1,501 characters, which is 79×19. I’m not a cryptographer and I haven’t been able to get anywhere with this info, but I thought it might be useful.
Dave on February 5, 2013 at 10:40 am said:

Thanks, Sean; those are useful observations. Also, the SPAN elements in the cipher text appear to set apart some of the symbols:

http://zodiackillerciphers.com/images/fates-unwind-span-elements.png

The first span element surrounds the “˛¿” symbols (I’ll call this the “prefix”). Then, the bulk of the cipher is surrounded with a pair of span elements that each surround a single question mark, as if to enclose the main body of cipher text. Each of the question marks is styled with the “Century” font, whereas the symbols in the prefix, suffix, and main cipher text are styled with the “Arno Pro” font. Finally, the last span element surrounds the “µ˛¿.¡” symbols (I’ll call this the “suffix”). Perhaps these groupings of symbols have separate meanings from the main cipher text.

The “?” symbols do not recur within the main body of the cipher text. If you exclude them, then the rest of the cipher text contains only 26 unique symbols.

So, the “true” cipher length may depend on which of these seemingly special symbol groups are excluded from the count.