Department of Linguistics
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
phone.: +49 (0) 341 3550 - 300
fax: +49 (0) 341 3550 - 333
Quantitative approaches to lexical comparison
Lexical material ("words") is one important source of information to establish genealogical relations between languages. We investigate quantitative methods to assist linguists in this kind of historical-comparative research.
The comparison of words (i.e. strings of characters) is strongly reminiscent of the comparison of strings of DNA. The major difference is that the strings of DNA are normally much longer than words in human language. This means that in principle there is more information in the DNA-strings to properly assess their similarities. In contrast, each character of a linguistic word is much more informative than the 'letters' ofDNA. In DNA there are only four letters (A, C, G, T), while a human language has between 15 and about 100 different letters (phonemes). This means that each individual character in strings of human language carries more information than the 'letters' of DNA.
We expect that these different kinds of data are roughly equally informative, and consequently we are adapting approaches from DNA comparison for the comparison of words in human language. As for the data, we are using the various wordlists that are collected in our department to investigate and test different kinds of quantitative methods of lexical comparison.
Selected recent publications produced within this project
Cysouw, Michael & Hagen Jung. 2007. Cognate identification and alignment using practical orthographies. Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, 109-116.
- Michael Cysouw
- Hagen Jung