According to this recent Wired article, Rajesh Rao, a computer scientist from the University of Washington, has run a Markov chain finder on the 1500-odd fragments of (the as-yet-undeciphered) Indus script – and has ‘discovered’ that it is “moderately ordered, just like spoken languages“.

Well, ain’t that something.

In a depressingly familiar echo of the ‘hoax’ debate over the Voynich Manuscript, the most important result is that it argues against Steve Farmer’s (2004) case that the Indus fragments were merely “political and religious symbols, i.e. not a language at all, but just odd visual propaganda of some sort.

Language is a tricky, evolving, misunderstood, dynamic artefact that typically only has meaning within a very specific local context. The failure of linguists to “crack” the Indus fragments (all of which are very short) is no failure at all – we are massively disadvantaged by the passing millennia, and cannot easily trace the structure within the flow of ideas (the perennial intellectual historian hammer).

Having said that, what I read as Farmer’s basic idea – that researchers have for too long looked for a definitive script grammar as an indicator of advanced literacy – is an excellent point. And so the notion that Indus script analysts should perhaps be instead looking for some kind of arbitrary / non-formalized explanation (a confused model, rather than a complex one) is sensible. My opinion is that Farmer is overplaying his skeptical hand, and that the script is very probably communication (as opposed to mere decoration) – but is it written in something we would recognize as a language? Apparently not, I would say.

Incidentally, Indus script uses roughly 300-400 symbols (depending on how you count them), with the most frequent four symbols making up about 21% of the texts: inscriptions (many on potsherds, also known as ostraca) are all short, with an average length of only 4.6 symbols. All of which makes the script completely unlike known languages – but all the same, what is it?

Perhaps Rajesh Rao’s Markov models will reveal some kind of pointers towards its hidden structure, towards the truth – but as to Rao’s suggestion that they may well yield a “grammar”… I suspect not.

PS: Farmer cites Gabriel Landini & Rene Zandbergen’s paper (funny, that), though points out that Zipf’s Law is an ineffective tool for differentiating language-based texts from non-language-based texts. Just so you know…

3 thoughts on “Indus script & Markov models…

  1. Dennis on April 29, 2009 at 6:36 am said:

    Hi Nick! A few half-baked comments from one who’s spent little time on the Indus script.

    Do the glyph statistics you quote really make it unlike known languages? The count of 300-400 symbols suggests some logographic or symbolic system. What, for instance, would you see with the use of Chinese characters in a highly specialized usage, say for weather data or naval signaling? (In fact, you ought to be able to get good stats for naval codes!)

    Another thought that occurs to me is of some limited logographic systems I’ve seen for Africans and Native Americans. These are relatively small sets of symbols for limited communication. In fact, hobo symbols are similar. I discuss such things on my site The Symbols of Mankind (yes, I know my graphics are pathetic.)

    I can see how Indus script might be something like that. Also, what would you find in the early history of Sumerian cuneiform, as it was just getting started?

    Finally, I’ve always been skeptical of the value of Zipf’s Law. It is indeed true that some non-language things follow it. Also, engineers used to say, “Anything makes a straight line on log-log paper.” In other words, everything follows a power law to some extent, and that’s what Zipf’s Law is.

    A while back, I read a fascinating book called The Social Atom by Mark Buchanan. It’s the most intriguing thing I’ve read on human behavior in a long time! Somewhere in there he gave an explanation of why Zipf’s Law is so pervasive. Remind me to look it up.


  2. The report by scientists in Science magazine is an important contribution to language studies. It provides for an analysis of structural patterns which are the characteristic of languages.

    A very important characteristic of languages is the semantic structure, that is, the underlying meanings of spoken words of languages. It is the ‘meaning’ which provides a structure even for short sequences of, say, an average of five symbols used on Indus script.

    A major omission in the script studies so far is the arbitrary distinction made between so-called ‘pictorial motifs’ and ‘signs’. As in Egyptian hieroglyphs, it is possible that the entire corpus of Indus script is composed of glyphs — such as a rim of a narrow-necked jar, rimless pot, fish, svastika, antelope, elephant, tiger looking back, crocodile, ligatured animal body with three heads of one-horned heifer, short-horned bull, antelope, person seated in penance.

    Unless all the glyphs are decoded in a logical cluster, taking into account the media used for inscriptions (such as terracotta bangles, copper plates, metallic weapons, tablets, seals), the decoding will not be complete.

    The error made by Sproat et al is in assuming that a script has to be syllabic or alphabetic and in not evaluating the possibility of the glyphs representing words, spoken rebus (use of similar sounding words to connote substantive messages).

    This website presents two pure tin ingots with inscriptions and proving them to be rosetta stones representing tin metal. Who else but metallurgists could have had the competence to inscribe on metallic weapons and on copper plates? This website also underscores the fact that during historical periods, early punch-marked coins from mints used the same corpus of glyphs pointing to a continuum in culture in ancient India.The conclusions drawn are that the glyphs get encoded within one semantic category — repertoire of mints and of mine workers, pointing to the link between two great inventions: invention of writing and invention of metal alloying.

    These conclusions have to be evaluated by any further scientific studies within the context of the continuum of language evolution as a cultural marker of an extensive civilization.

    See 8 albums of Sarasvati hieroglyphs and decoding at


  3. Šuruppag on November 15, 2013 at 1:05 am said:

    Rao published a solution to the Indus valley script in the 90s which he died believing to be correct. Whether you believe his translation or not his comparative method was ingenious and excellent.

