Heteroscedasticity – now there’s a word you don’t see very often (thanks to Rosco Paterson for kindly plonking it in my path). Which is a pity, because it’s a particularly useful concept that might help us crack several longstanding cipher mysteries.
The idea behind it is not too far from the old joke about the statistician with his feet in the oven and his head in the fridge, who – on average – felt very comfortable. A set of numbers is heteroscedastic if it simultaneously contains different (‘hetero-’) subgroups such that (for example) their average value falls between the groups. As a result, looking to that average for enlightenment as to the nature of those two separate subgroups is probably not going to do you much good.
Perhaps unsurprisingly, it turns out that a lot of statistical properties implicitly rely on the data to be analyzed not having this property. That is, for data with multiple modes or states, the consequent heteroscedasticity is likely to mess up your statistical reasoning. Though you’ll still get plausible-looking results, there’s a high chance they’ll be of no practical use. So for cipher systems in general, any hint of multimodality should be a heteroscedastic alarm bell, a warning that your statistical toolbox may be as much use as a wet fish for tightening a bolt.
Plenty of Voynich Manuscript (‘VMs’) researchers will be sagely nodding their heads at this point, because they know all too well that the plethora of statistical analyses performed so far on it has failed to yield much of consequence. Could this be because its ‘Voynichese’ text heteroscedastically ‘hops’ between states? Cipher Mysteries regulars will know I’ve long suspected there’s some kind of state machine at play, but I’ve yet to see any full-on analysis of the VMs with this in mind.
Historically, the first proper ciphering state machine was Alberti’s 1465 cipher disk. He placed one alphabet on a stator (a static disk) and another on a rotor (a rotating disk), rotating the latter according to some system pre-agreed between encipherer and decipherer, e.g. rotating it after every couple of words, or after every vowel, etc.
Even if you don’t happen to buy in to my Averlino hypothesis (but don’t worry if you don’t, it’s not mandatory here), 1465 isn’t hugely far from the Voynich Manuscript’s vellum radiocarbon dating. It could well be that state machine cryptography was in the air: perhaps Alberti was building on an earlier, more experimental cipher he had heard of, but with an overtly Florentine, Brunelleschian clockwork gadget twist.
As an aside, there are plenty of intellectual historians who have suggested that the roots of Alberti’s cipher disk lie (for example) in Ramon Llull’s circular diagrams and conceptual machines: in a way, one might argue that all Alberti did was collide Llull’s stuff with the more hands-on Quattrocento Florentine machine-building tradition, and say “Ta-da!” 🙂
All the same, we do know that the Voynich Manuscript’s cipher is not an Albertian polyalphabetic cipher: but if it is multimodal, how should we look for evidence of it?
A few years ago when my friend Glen Claston was laboriously making his own transcription of the VMs, he loosely noticed that certain groups of symbols and even words seemed to phase in and out, as if there was a higher-level structure underlying its text. Was he glimpsing raw heteroscedasticity, arising from some kind of state machine clustering? For now this is just his cryptological instinct, not a rigorous proof: and it is entirely true he may have been influenced by the structure of Leonell Strong’s claimed decryption (which introduced a new cipher alphabet every few lines). Despite all that, I’m happy to take his observation at face value: and that Voynichese may well be built around a higher-level internal state structure that readily confounds our statistical cryptanalyses.
So, the big question here is whether it is possible to design tests to explicitly detect multimodality ‘blind’. The problem is that even though this is done a lot in econometrics (there was even a Nobel Prize for Economics awarded for work to do with heteroscedasticity), economic time series are surely quite a different kettle of monkeys to ciphertexts. Perhaps there’s a whole cryptanalytical literature on detecting heteroscedasticity, please leave a comment here if you happen to know of this!
I don’t know what the answer to all this is: it’s something I’ve been thinking about for a while, without really being able to resolve to my own satisfaction. Make of it what you will!
At the same time, there’s also a spooky echo with the Zodiac Killer’s Z340 cipher here. I recently wrote some code to test for the presence of homophone cycles in Z340, and from the results I got I strongly suspect that its top half employs quite a different cipher to the bottom – the homophone cycles my code suggested for the two halves were extremely different.
Hence it could well be that most statistical analyses of Z340 done to date have failed to produce useful results because of the confoundingly heteroscedastic shadow cast by merging (for example) two distinct halves into a single ciphertext. How could we definitively test whether Z340 is formed of two halves? Something else to think about! 🙂