How complicated is your language?

January 26, 2008 By: erik Category: Geeky, Musings 1,038 views

Rate this post:
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)

Like living abroad, learning a second language results in a lot of introspection about your own language, the rules that govern it (or don’t), and how and why those rules would have come to be. Also, as a member of a bilingual household, I find it interesting which words and phrases are typically said in one language or the other. Some phrases are just easier and shorter in one language than another.

I recently asked one of my blogrollmates, who earns her living translating books from French to English, which language requires more words and pages to convey the same information. Her response was that a French document will have about 30% more words than its English counterpart. I suspect that the same is true for Spanish, even though most of the words will be small possessive transitive helpers (de, la, a, se, lo, te, me, nos) that we don’t have in English. German, on the other hand, might have much fewer words, but many more letters due to the nature of the language.

I think it would be interesting to see some real data on this topic. Surely all the languages could be ranked by terseness or expressiveness or succinctness or whatever you want to call it. Perhaps such data could be gleaned from analyzing translated literature at the Gutenberg Project or something.

The other day I was installing Microsoft Word 2008. Having recently been pondering this stuff, one particular part of the installation caught my eye: the selection of which “proofing tools” (spell checker, thesaurus, etc.) I wanted to install. Check out the enormous differences in the amount of space these dictionaries take up. Surely there must be some correlation between the number of mebibytes needed to verify spelling and grammar rules for a language and the language’s general complexity. Obviously there will be differences in how much time and effort Microsoft has spent in getting each language right, but I think it works nicely as a general measure of spelling and grammatical complexity.

Proofing tool sizes for various languages

It turns out that German is the most complex and Portuguese is the simplest European language.

…at least to the extent that this is a decent measure.

  • Interesting topic. I’ve always considered English an economical language but it’s fun to see how it stacks up compared to other languages.

  • Hmmmmm.

  • I’m with Jane. In theory your idea sounds like it would have at least some general validity, but I think it rests on a shaky premise (“Obviously there will be differences in how much time and effort Microsoft has spent…”). Also, knowing something about both languages, I have difficulty with the idea that Italian is 50% more complex than Spanish.

    Generally speaking, linguists consider all languages (with very rare exceptions at either end of the bell curve) roughly equivalent in terms of ‘richness’, variety, expressiveness, etc. The English lexicon is larger than that of many other languages because it is a very ‘promiscuous’ language, borrowing readily from all of the many languages it comes into contact with. But the main result of this is simply that we are relatively rich in synonyms. My English-Slovene dictionary is twice as thick as my Slovene-English volume from the same publisher, but that doesn’t mean that English is twice as complicated, just that we have more ways of saying the same thing. If we trust Microsoft (and your theory) on this, we have to conclude that German is 10 times more ‘complicated’ than Japanese. And linguistically speaking, that’s not gonna wash.

    More to the point concerning relative difficulty among languages are cultural considerations that are very difficult to describe, let alone quantify — have some native speaker of Arabic explain to you why ‘tree trunk’ and ‘motorist’ (obviously) come from the same etymological root and you’ll see what I mean.

    See also Sapir-Whorf Hypothesis — though largely discredited (or just out of fashion), still germane.

  • Wow, sgazzetti! Whenever I dangle these “look at me, I’m bilingual!” posts out there, you chomp down on them with rigorous factual linguistic information. I love it.

    I suspect that my screenshot really has very little to say about the actual languages and a lot to say about Microsoft’s (and their corporate clients’) desire for proper proofing tools.

    I just spent an hour reading that entire New Yorker article on the Pirahí£. Fascinating stuff. It reminded me of that Mark Twain quotation, which I have on a t-shirt, “All generalizations are false, including this one.”

    Again, superb comment.

  • Italian more complex than Japanese? Those two file sizes are the ones that stand out as least representative (to me) of the reality of those languages. Paola tried to learn Japanese once. Three writing systems, adjectives which have to agree with the verb tense, numerous number systems, depending on what it is you’re counting…

  • Heather

    Hmmm I liked that……. Simon I think the reason why Japanese is least as far as proofing is because it’s a tonal language, which would be slightly hard to convey on paper in my opinion…….

    I always thought some things were just better said in Spanish, so much so that I’ll still say those certain things in spanish just because there are no words in the English language that can explain it well enough……