In a recent post here, I floated the idea that the Zodiac Killer’s Z408 (solved) cipher’s unusual homophone distribution may have arisen not conceptually (i.e. from a hitherto-unknown book on cryptography), but instead empirically (i.e. emerging from the properties of a specific text).
It’s certainly possible that he might have used his own (private) text to model his homophone distribution, in which case we probably almost no chance of reconstructing it. However, I think it likely that he instead used the first few characters of an already existing public text (such as Moby Dick, the Book of Genesis, the Declaration of Independence, or whatever) to do this.
It’s a reasonable enough suggestion, I think: and moreover one that we can try to test to a reasonable degree.
Z408’s homophones
A homophonic cipher key allocates a number of cipher shapes to individual plaintext letters, usually (but not always) in broad proportion to their frequency. So in a typical homophonic cipher key you would expect to see far more shapes for E (the most common letter in English) than for, say, Z or Q.
Though this is essentially the case for what we see in the Z408 cipher (particularly for the more frequent letters, ETAOINS), the numbers of homophones chosen for the less frequent letters seem somewhat idiosyncratic and arbitrary:
7 shapes – E
4 shapes – T A O I N S
3 shapes – L R
2 shapes – D F H
1 shape – B C G K M P U V W X Y
Did not appear: J Q Z
People have long searched for a primer or textbook on cryptography where the description of the alphabetic frequency distribution matches this, or even where the alphabetic frequency ordering (e.g. ETAOINSHRDLU etc) matches the order here, but in vain.
Designing a filter
The basic idea for the filter is easy enough:
* read in characters from the start of a passage (we’re only interested in capitalized alphabetic letters, i.e. A-Z)
* if the instance count of that character is higher than the top of the desired range, then the test fails
* if the instance counts for all the characters are within the desired range at the same time, then the test passes
* else keep reading in more characters until the test terminates
As a side note: of all the Z408 homophones, only X appears exactly once in the Z408 ciphertext itself: but while it is conceivable that the Zodiac Killer might have allocated extra homophones for X, it does seem fairly unlikely.
The desired ranges for each of the characters would look like this (though feel free to adapt this if you disagree with the homophone counts listed above):
[7,7] – E
[4,4] – T A O I N S
[3,3] – L R
[2,2] – D F H
[0,1] – B C G K M P U V W Y J Q Z
[0,3] – X (to err on the side of safety)
Note that the single-letter characters have a slightly broader [0,1] range because we have no way of knowing whether or not they would have actually appeared in the original text.
Here are two test texts that should both pass:
EEEEEEETTTTAAAAOOOOIIIINNNNSSSSLLLRRRDDFFHHZZZZZZZZZZZZZZZZZ
BCGKMPUVWYJQZXEEEEEEETTTTAAAAOOOOIIIINNNNSSSSLLLRRRDDFFHHZZZ
Which texts to try?
Though any text published before August 1969 would potentially be a match, it would make sense to look at all manner of texts, and possibly even the first few lines of different chapters of books (though I’d be a little surprised if that was the case). All the same, the filter is easy enough to write (and should execute in a matter of microseconds) and to test, so the difficulty here lies mostly in getting hold of enough texts to try, rather than the compute time as such.
Oddly, I don’t really have a solid feel for how often the filter will find a match: my gut instinct is that roughly one in a million English text comparisons will pass, but that’s just a guesstimate based on each letter having its own little bell-curve distribution, all of which have to match at the same time.
So what do you think will match? “Catcher in the Rye” or “Moby Dick”? Place your bets! 😉
Interesting idea. A further variation would be to consider longer passages that have the same relative letter distribution.
For example, your test text “EEEEEEETTTTAAAAOOOOIIIINNNNSSSSLLLRRRDDFFHHZZZZZZZZZZZZZZZZZ” would pass. But the doubled version “EEEEEEETTTTAAAAOOOOIIIINNNNSSSSLLLRRRDDFFHHZZZZZZZZZZZZZZZZZ EEEEEEETTTTAAAAOOOOIIIINNNNSSSSLLLRRRDDFFHHZZZZZZZZZZZZZZZZZ” could also pass, since it has the same relative distribution of letters.
So, too, could passages of arbitrary length share the same relative letter frequencies.
My hunch is that there is enough variance in letter distributions in shorter passages, that the filter would find many matches given enough samples.
You gonna program this up? 🙂
The Mikado? Awful lot of J’s and Q’s on the first page, though.
He also should have looked at his own writing- biggest mistake was the L, possibly because of the use of so many double L’s (kill, shall, thrilling, will, etc.) and used a similar shaded box for 2 of the 3 cipher symbols, the jump out at you. However, he did use L 5% (normal is 4%) of the time, same as he did R (normal 6%).
Where his decision for H (Z and normal = 6%), D (Z at 3.3%, normal 4.3%) & F (~2% in both) come from is odd, so perhaps there is source he used and counted himself. Interesting thought!
Let’s try with a Zodiac Watches Manuel 🙂
Dave: note that I’m not suggesting the Zodiac put together a statistical analysis of a given text, but rather counted the first few lines that contained exactly 7 Es, etc.
That is, I suspect that he had read about homophonic ciphers, but not about how to construct them: and so to do this, counted a section of text from a (possibly favourite) book. Which would make the Z408 not an improvised cipher, but an improvised cipher construction.
As for trying it out, why not? 🙂
Nick & other decoders:
He was still using his di di di dit (Morse) code long after receiving an honorable discharge (medical reasons). The cause of the ‘medical’ discharge was that he and a couple of buddies rolled over their Jeep/vehicle while on a drunken spree.
Have any of you considered that he may have been rolling out sets of dice and then assigning alphabetical characters to make words: “PARA DICE’ for instance. (?) ZO DI AC The circle with a centered cross could possibly be a picture of a rifle scope. (translate to “I’ve got you in my sights”. BTW : He and his brother Preston were lousy shots, whether pistol or rifle.
So….maybe I have been a suspect all along?
Neither of his sons believe that he caused the head injuries and broken arm.
Somehow or other I have to alert my daughter to the dangers of my sons and their wives (Vietnamese and Taiwanese).
bd
Yes, Lee, I remember just who came along for the ride to Chinese gambling casinos and hotels. I also remember who retrieved her dead father — and just where she buried him. Apparently, Rob has no memory of our consultation with Susan Shackelford and her daughter.
So, it is quite apparent that Lee and Robert have forgotten who their mother was/is.
Anyway, their grandmother’s last married name was Pexa.
bd
Nick You proposal is very clear and easy to set up. Unfort. I do no have the time but perhaps later this year. I would like to add that in the Z408 you can see some patterns on the design of the characters. The quote in the middle is a clue perhaps too, because the order of the words there are probably like “personal pronoun” – verb – ….
Nick, I went ahead and wrote up a quick filter.
http://pastebin.com/raw/Y82YvAPi
I confirmed that it matched on your two test strings. But it has thusfar not matched on anything else. I ran it against some books from Project Gutenberg and the entire corpus of Zodiac’s correspondences. Checked all possible substrings.
I included the Mikado and Moby Dick in the tests. No matches. Here’s the full list:
Clarissa, by Samuel Richardson
Leaves of Grass, by Walt Whitman
Pride and Prejudice, by Jane Austen
Les Miserables, by Victor Hugo
Sherlock Holmes, by Arthur Conan Doyle
Grimms’ Fairy Tales, by The Brothers Grimm
How to Analyze People on Sight, by Elsie Lincoln Benedict and Ralph Paine Benedict
Ulysses, by James Joyce
Metamorphosis, by Franz Kafka
Complete Adventures of Huckleberry Finn, by Mark Twain (Samuel Clemens)
A Tale of Two Cities, by Charles Dickens
The Count of Monte Cristo, by Alexandre Dumas
Les Miserables, by Victor Hugo
Varney the Vampire, by Thomas Preskett Prest
American Poetry, 1922, by Edna St. Vincent Millay and Robert Frost
War and Peace, by Leo Tolstoy
Moby Dick; or The Whale, by Herman Melville
History of the Warfare of Science with Theology in Christendom, by Andrew Dickson White
Anomalies and Curiosities of Medicine, by George M. Gould and Walter Lytle Pyle
The Complete Plays of Gilbert and Sullivan, by William Schwenk Gilbert and Arthur Sullivan
So you are right to suspect that an exact match to those letter ranges will be difficult to locate.
Try “the Most Dangerous Game” by Richard Connell. based on the wording of one of the killers letters
Here is another experiment I tried: I have a “pick a random letter” algorithm that factors in the expected letter frequencies of the English language. For example, it will randomly pick E more often than other letters. It’s like a roulette wheel with 26 slices, one for each letter. The size of each slice depends on the frequency of its corresponding letter. It uses these frequencies (for A through Z):
{.08167, .01492, .02782, .04253, .12702, .02228, .02015, .06094, .06966, .00153, .00772, .04025, .02406, .06749, .07507, .01929, .00095, .05987, .06327, .09056, .02758, .00978, .02360, .00150, .01974, .000749}
So I generated millions of random 59-letter strings (because 59 ends up being the max size permitted by the filter). They look like this:
PYHNUAELLVESRYKEDHPTOAFGETUHCUEANCCOYVTDDYTNODOAYEOMEETBTFF
TEIDALPRMKRANENBEOLOSOSPOZBHTETUFESOENKTHVKOEENLIAHWYRRONCO
DHHDFOUTHAATAEDPULREIAOEEEEUEFWHSITPOCKOREEURTGNNAOUUNNEEOA
I then run the filter to see if any of them match. When testing 170,000,000 strings, only 2 of them passed the filter:
HABWCELTTEEDINLFAGSAIAEUMYNEELOEITROIROTNFSODSHRNS
SILFTSYHEOIUOPESRDNOLFIEATARMENBOETRSANAEENDHCLTI
(You might want to check my work to ensure those strings meet your filter criteria).
This result suggests that the estimated probability of a given string of English meeting your filter’s criteria is about 1 in 85,000,000. Your gut feeling was on the right track, it seems!
So, it would be quite interesting to find out if some real extract of text really does meet your filter’s criteria. Especially if it is already known to have some significance to the killer.
Donald,
OK – I tested that short story and the filter doesn’t match on any of it. Here is the text I used:
https://archive.org/stream/TheMostDangerousGame_129/danger.txt
Dave: thanks very much indeed for coding that up, I’ll doublecheck your Java in the next couple of days. But it’s a pretty straightforward algorithm, so I’m genuinely not expecting to find a problem there. 🙂
Of course, the problem with using a stats-based
filterdata-set to derive stats-based results (as you did with your 170,000,000 chunk test) is that you’re only going to get back the stats you put in – whereas different types of English (e.g. poems, translations, non-fiction, dialogue, plays) will have different stats, each of which will broadly fit the results better or worse than the others. Even so, 1 in 85 million (even given the sample size of 2) is perhaps a good sign that this might indeed point to just a single text.Perhaps a good starting point would be the Bible, with 59-letter chunks starting at the beginning of each sentence. Your stats would suggest that we wouldn’t get a single match but… who can tell?
Alternatively, “Juliette” by the Marquis de Sade has been suggested as a source for the Zodiac’s curious murderous belief in slaves and the afterlife:
“It is an article of faith on the island of Borneo that all those persons a man kills will be his slaves in the next world; and as a result, the better a man wishes to be served after his death, the more he kills during life.”
http://www.zodiackiller.com/mba/gzd/760.html
There may also be some text by (or used by?) Charles Manson that Zodiac picked up on, because (as the thread I linked to above makes clear) Zodiac seems to have in some way shared the slaves / paradise / death belief set that Manson had.
It was worth a try
I could use a clarification- are you only thinking of counting capitalized letters? Though some thoughts I had may hold true in either repeat, however.
Using only capitalized letters, E would seem to not be the most common letter you would find. However, I noticed as I was listening to music all weekend, weird letter and word usages happen in song/poetry. Lines and words repeat, possibly allowing for a skew in letter usage at the beginning of words, or or capitalization you often find in poetry for emphasis, particularly Beat poetry popular at the time.
If you are looking more at the words entire, H is one of the most skewed as “the” is the most common word in the English language. Poetry and song often use shorter and partial phrasing, and especially if rhyming, can create different letter distributions from normal.
Z’s somewhat flatness in his distribution also suggests to me, from a linguistically standpoint, a mix of first person singular and first person plural, or possibly first person singular mixed with third person, as you would likely find in a musical, i.e. an individual sings the verse and the chorus is sung by the cast, changing the pronouns used. The elevation in S’s and dearth of D’s makes me lean present tense, but its really hard to say since there isn’t a large distribution to go off of, just those letters that seem really out of place percentage wise.
I also don’t see Z as being a huge fan of the classics, I would look more at pop culture of the time- Hair the musical, trendy songs, popular Beat poems- if the desktop poem is written by him, it shows an interest in that form of poetry, and we also know from his interest in the Mikado he likes musicals.
Marie: I’m talking about case-equalized text, e.g. LONDON rather than London, rather than acrostic (word-initial letters only).
Personally I would be trying some of the Masonic publications/journals from the era, particularly the ones used for educating members to achieve the various degrees. The Scottish Rite has 33 degrees, which is 57 degrees less than a right angle. Only a Freemason would understand the significance of a right angle. A radian is 57.3 degrees. The number 3 is a sacred number in Freemasonry, creating the triangle and pyramid, and in multiples can form a basic diamond. This is why senior Freemasons wear their diamond rings. In Freemasonry, five is a sacred number, inferior only in importance to three and seven. So here we have the numbers 5, 7, 3. The Zodiac loved his radians. I have long suspected that his emphasis was to suggest an association between the three sacred numbers of Freemasonry. It had nothing to do with mapping in my opinion. The map was most likely a red herring and far too obvious to have been the actual clue. It was also far too scrappy (just a road map with very poor scale and very little detail). The real clue was hidden within. There are many examples of Masonic symbolism to be found throughout the Zodiac’s correspondence, but they are not always obvious at first glance. And of course the Zodiac suggested or “courted” various well known Freemasons (or Freemason descendants) during his brief reign of terror, namely Melvin Belli and William “Peek-A-Boo” Pennington. In case anyone is wondering, no I am not hinting at rubbish conspiracy theories like Illuminati. I am talking about Freemasonry in its everyday sense.
OK I added a few more books to the filter test:
http://pastebin.com/raw/Jm5hq9A9
Includes the Bible, Juliette (and de Sades other works), and the top 25 most popular downloads from Project Gutenberg in the last 30 days. Didn’t run the filter on Manson’s writing, though. If you want to find some text for me, feel free to send it and I will run it through the filter.
Still found zero matches during this test, however.
The total number of string segments that were tested in the filter was 50,821,872.
Maybe I need to unpack the old 6GB Gutenberg DVD I downloaded long ago and let the filter chug through it. 🙂
Meanwhile, if you think of other texts to try out, send them to me and I’ll post the results.
@Christophe Maggi: Were you trying to refer to 24 hour shift/tour changes? For example: The US Post Office/Postal Service had wall-mounted trays for punch-cards for three eight-hour shifts. At least that was Rincon Annex, San Francisco, CA
bd
Can you please try Harold and the Purple Crayon, and The Old Man and the Boy?
Could he be self-referencing using one of his earlier codes?
It would require less brain power to simply substitute from his own hand-written codes and solutions.
https://en.wikipedia.org/wiki/Etaoin_shrdlu
perhaps this was a key of its own, or any messages containing such could have been some kind of sick joke for the zodiac, either telling the editors they sent the message not to print it, or forcing them to print what was once used to indicate a mistake on a machine used for typesetting.
at this point its 2024, if zodiac is still alive,
minimum age for the 1970 (roughly 54 years ago) letter would be around 14 and put them at 68 years old (this is an unlikely edge case), if they were 46 they would be 100 now if still alive, at that age or any age beyond that they likely died already.
though it cant be ruled out, the likelihood of someone having the free time, resources, etc coinciding either with someone who was retired, or perhaps was the spouse of someone retired is moderately high, otherwise perhaps someone who was self employed or called off work sick around the times of the murders