Project Members

Paul Heggarty
Scott Sadowsky
(formerly: Warren Maguire)
(formerly: April McMahon)

Language Relatedness and Divergence: Quantitative and Phylogenetic Approaches

This research theme served to support the related theme on Cross-Disciplinary Prehistory. It did so by focusing on the single most basic key to the linguistic record of population (pre)history, namely language relatedness, and developing new quantitative approaches to address both of its main facets:

To help evaluate whether given language lineages do or do not stem from a common origin…
And if they do, to measure as finely as possible just how closely they are related, or in other words, how far they have diverged from their common ancestor.

Once we can produce meaningful measures of language difference and divergence, they can then feed into the growing trend for taking the latest techniques for phylogenetic, probabilistic and statistical analysis, drawn originally from the biological sciences, and applying them now to data on language divergence, to help analyse and represent how languages relate to each other. These are extremely powerful tools with undoubted potential for clarifying language (pre)histories, even if applying them to language ‘evolution’ is by no means problem‑free.

Often most appropriate are analyses of the type not restricted to modelling language relationships only in terms of a ‘family tree’ with binary branches, but which can instead also visualise networks. These are often a more realistic representation of how language varieties actually relate to each other, and indeed of the underlying processes in the real-world that shaped those relationships in the first place (see Heggarty et al. 2010). These language data, and new quantitative tools for analysing them, can open up valuable new perspectives on the linguistic signals that survive from past relationships between human populations, which can then be compared and combined with those of other disciplines.

Moreover, however sophisticated these ‘number‑crunching’ tools may be, their results can only ever be as good as the numbers we feed into them in the first place. This research then centred on this critical prior stage, of how best to ‘encode’ and put numbers on real language data, with all its complexities. Language is an inherently non‑numerical phenomenon, so to be realistic all we can aspire to is a most meaningful approximation to it in figures. A key concern is to ensure that our measures are appropriately weighted against each other, in terms of their respective real linguistic significance. Quite how to assess that is a core question for this research, as it aims to establish basic methodological principles for language quantification, as advanced in Heggarty (2006), Heggarty (2010), and a chapter in preparation for the forthcoming Oxford Handbook of Diachronic and Historical Linguistics.

This research theme has developed its own new techniques for measuring distances between language varieties. Some of these do look to the traditional data source of lexical semantics, although they are purposely devised to be very different indeed from traditional ‘lexicostatistics’, with its many known weaknesses (see Heggarty 2010). The novel techniques developed here aim to help assess whether given language families are or are not related to each other in the first place (e.g. Heggarty 2010, Heggarty 2011).

Principally, though, this theme looks to phonetics, where divergence can be measured to an especially fine-grained level: between languages closely related to each other within the same family, and indeed down to the dialect and accent level. For any one family, the raw data collected are recordings of specific sets of common words, cognate (i.e. directly related) across all language varieties within that family. These are first transcribed phonetically, and the sounds within any one word then matched up against those in the corresponding pronunciations in all other languages in that family. This matching is achieved through node forms that encapsulate the basic knowledge of the common ancestral language from which that family derived. As an example, the individual sounds in French étoile, Spanish estrella and Romanian steauă (all meaning star) are matched up against each other through their respective relationships of derivation from the sounds within the original Latin stella(m) from which each of the modern words derives. This matching is highly automated by the phonetic analysis programme used, but always followed by expert linguist revision. From these transcriptions and matchings as input data, a purposely developed programme produces measures of how far these languages have diverged from each other in phonetics, which in turn provides a perspective on how the Romance family developed through (pre)history.

This approach has been explored so far particularly for varieties of English worldwide and through history, and more widely across dialects of Germanic (e.g. Heggarty et al. 2010, Maguire et al. 2010). In ongoing work, analysis is now underway of similar databases already collected for further studies on the Slavic and Romance families, and particularly a further specific project on the Sounds of the Andean Languages.

The recording and transcription databases also feed directly into the related research theme 'Sound Comparisons': New Tools and Resources for Exploring Language Family Diversity on the Web.

Publications within this Research Theme

Heggarty, P. forthcoming
Commentary on: Chen, Sokal & Ruhlen (1995), Worldwide analysis of genetic and linguistic relationships of human populations.
Human Biology (Special issue in honor of Robert R. Sokal).

Heggarty, P. 2011. Enterrando el esqueleto quechumara.
In W. Adelaar, P. Valenzuela, & R. Zariquiey (eds) Estudios en lenguas andinas y amazónicas. En homenaje a Rodolfo Cerrón-Palomino, 147–179. Lima: Fondo Editorial de la PUCP.

Heggarty, P. 2010.
Beyond lexicostatistics: how to get more out of ‘word list’ comparisons. Diachronica 27(2): p.301–324.

Heggarty, P., Maguire, W., & McMahon, A.M.S. 2010.
Splits or waves? Trees or webs? How divergence measures and network analysis can unravel language histories
Proceedings of the Royal Society B: Biological Sciences (365) Special issue on: Cultural and Linguistic Diversity (eds J. Steele, P. Jordan, & E. Cochrane.): p.3829–3843.

Maguire, W., McMahon, A.M.S., Heggarty, P., & Dediu, D. 2010.
The past, present and future of English dialects: Quantifying convergence, divergence and dynamic equilibrium.
Language Variation and Change 22(1): p.69–104.

Nerbonne, J., Heggarty, P., Van Hout, R., & Robey, D. 2009.
Panel discussion on computing and the humanities
J. Nerbonne, C. Gooskens, S. Kürschner, & R. van Bezooijen (eds).
International Journal of Humanities and Arts Computing 2: p.19–37.

Heggarty, P., Maguire, W., & McMahon, A. 2008.
Accents of English from Around the World.

McMahon, A.M.S., Heggarty, P., McMahon, R., & Maguire, W. 2007.
The sound patterns of Englishes: representing phonetic similarity.
English Language and Linguistics 11(01): p.113.

Heggarty, P. 2006.
Interdisciplinary indiscipline? Can phylogenetic methods meaningfully be applied to language data — and to dating language?
In P. Forster & C. Renfrew (eds) Phylogenetic Methods and the Prehistory of Languages, 183–194. Cambridge: McDonald Institute for Archaeological Research.

Heggarty, P. 2005a.
Enigmas en el origen de las lenguas andinas: aplicando nuevas técnicas a las incógnitas por resolver.
Revista Andina 40: p.9–57.

Heggarty, P. 2005b.
Response to commentaries on Heggarty (2005). Revista Andina 40: p.70–80.

Heggarty, P., McMahon, A., & McMahon, R. 2005.
From phonetic similarity to dialect classification: a principled approach.
In N. Delbecque, D. Geeraerts, & J. van der Auwera (eds) Perspectives on Variation: Sociolinguistic, Historical, Comparative, 43–91. Amsterdam: Mouton de Gruyter.

McMahon, A.M.S., Heggarty, P., McMahon, R., & Slaska, N. 2005.
Swadesh sublists and the benefits of borrowing: an Andean case study.
Transactions of the Philological Society 103(2): p.147–170.

Heggarty, P. 2000. Quantifying change over time in phonetics.
In C. Renfrew, A. M. S. McMahon, & R. L. Trask (eds) Time Depth in Historical Linguistics, 531–562. Cambridge: McDonald Institute for Archaeological Research.

Project Members

Language Relatedness and Divergence: Quantitative and Phylogenetic Approaches

Publications within this Research Theme

Max Planck Institute for Evolutionary Anthropology

Quick Links

Departments and Groups

Project Members

Language Relatedness and Divergence: Quantitative and Phylogenetic Approaches

Publications within this Research Theme

Max Planck Institute for Evolutionary Anthropology

Quick Links