Towards a Cross-Disciplinary Prehistory: Converging Perspectives from Language, Archaeology and Genes
with David Beresford-Jones
I am a comparative linguist, with a foot in both historical linguistics and in language typology/universals. I look to both of these fields especially as rich and complementary mines of information (or at least inference) on human prehistory. I would sorely like to see it better understood outside our field precisely how, and how much, the languages we speak can tell us about our origins — just as surely as can our genes, or the ‘material culture’ that our ancestors left behind for archaeologists. The different data and methods of these disciplines, however, can each uncover only a part of the full story, and the sub‑plots in fact seem to contradict one another as often as they concur.
But there was only one past, so our common ultimate goal must be to converge our separate, partial perspectives on a more holistic understanding, coherent across all of our disciplines, of what really went on in human prehistory. Language data have an invaluable contribution to make — but even for a linguist, to interpret what they really mean for our past calls also for the insights of our fellow disciplines. I have therefore worked very closely in recent years with both archaeologists and geneticists, co‑operation that I am ideally placed to continue now that I have (re‑)joined MPI-EVA.
With archaeologists, I am currently (to end 2010) completing a set of chapters from a linguistic perspective for the forthcoming Cambridge World Prehistory, co‑authored with Colin Renfrew in Cambridge. Since January 2011 I have been joined at MPI-EVA by David Beresford-Jones, to co‑write a series of papers on specific issues where archaeology and linguistics can potentially learn most from one another.
In genetics, of course, MPI-EVA is already home to many colleagues with whom I aim to cooperate closely throughout the next few years, as evidenced for example by a recent joint paper of mine with Barbieri et al. (2010). In September 2011 I plan to host a cross-disciplinary conference at MPI-EVA on the lessons for prehistory that are now being increasingly revealed also in the dna — both ancient and modern — of indigenous populations of the Andes.
Quantitative and Phylogenetic Approaches to Language Relatedness and Divergence
The ultimate goal, that of better understanding our origins, explains why I do comparative linguistics. As for precisely how I go about mining language data to inform us on this, my own focus is on the most fundamental issue for prehistory, that of language relatedness. In particular I work to develop new quantitative approaches to both facets of the relatedness question: to help evaluate whether given language lineages do or do not stem from a common origin; and if they do, to measure how closely they are related as finely as possible.
I have considerable experience in the growing trend which takes the latest techniques for phylogenetic, probabilistic and statistical analysis, drawn originally from the biological sciences, and applies them now to data on language divergence. These are extremely powerful tools with undoubted potential for clarifying language (pre‑)histories, even if their application to language ‘evolution’ is not entirely problem‑free. I am particularly drawn to the type of phylogenetic analysis that is not restricted to analysing language relationships only in terms of a ‘family tree’ with binary branches, but can instead also visualise networks. These are often a more realistic representation of how language varieties actually relate to each other, and indeed of the underlying processes in the real-world that shaped those relationships in the first place (see for example Heggarty et al. 2010).
However sophisticated these ‘number‑crunching’ tools may be, though, their results can only ever be as good as the numbers we feed into them in the first place. My Ph.D. and much of my research since have centred on this critical prior stage, of how to ‘encode’ and put numbers on real language data, with all its complexities. Language is an inherently non‑numerical phenomenon, so to be realistic all we can aspire to is a most meaningful approximation to it in figures. A key concern of mine is to ensure that our measures are appropriately weighted against each other, in terms of their respective real linguistic significance. Quite how to assess that, however, is precisely the basic question with which I grapple in my work on basic methodological principles for language quantification.
In programming my own techniques for measuring distances between language varieties I’ve looked especially to phonetics, where divergence can in principle be measured to an especially fine-grained level: between languages closely related to each other within the same family, and indeed down to the dialect and accent level. I have explored this so far particularly for varieties of English worldwide and through history, and more widely across dialects of Germanic (e.g. Heggarty et al. 2010, Maguire et al. 2010).
I have also worked on new approaches to language quantification in lexical semantics too — but very different to traditional lexicostatistics — with a view to helping assess whether given language families are or are not related to each other in the first place (e.g Heggarty 2010).
Exploring Language Family Diversity on the Web: Online Sound-File Databases
Finally, with so much of my fieldwork time invested in collecting data from across Europe and the Andes, I am only too keen to make this mass of comparative data on regional accents, dialects and languages as available and as relevant as possible to the people who actually speak them. My research into how these language families developed through (pre‑)history focuses particularly on measuring how far they have diverged in phonetics, for which the raw input data are my recordings and then transcriptions of sets of common cognate words. Thanks to dedicated ‘dissemination’ funding I was able to turn these data into online resources intended not only for linguists but especially for the speaker communities themselves.
To this end I have devised ‘hover to hear’ websites, a user-friendly design to allow users to hear and compare instantaneously online the precise differences in pronunciation from one region to the next, across an entire language family at a time. The first such website was created for the main language families of the Andes, hosted on my existing Quechua website at www.quechua.org.uk/sounds. This is intended particularly to support understanding and uptake of the unified spelling system, hitherto frustrated by lack of awareness of the wide regional variation that it serves to cover. A second project then established the www.soundcomparisons.com site, on accents of English from around the world, later extended to the entire Germanic family, at www.languagesandpeoples.com/Germanic. The main sites are now being migrated to the server at MPI-EVA.
Thanks to the dedicated programming support of the MPI, the underlying structure will now be streamlined, improved and extended to new functions, such as displaying the instantaneous sound file links on Google maps, as well as in tables. The underlying structure will also be converted to a real-time database lookup system, so that my new databases for the Romance and Slavic families can be progressively expanded online. By 2012 the whole structure will be available as a template so that similar web databases can be easily and swiftly added for any other language family: candidates to be targeted in the long term include Turkic, Arabic, Bantu, Arawak, Celtic, and Indo-European itself.
|