Logo: MPI
Potentials of Language Documentation:
Methods, Analyses, and Utilization

Potentials of Language Documentation (November 3 - 4, 2011)
A workshop funded by the Volkswagen Foundation

Program and participantsDownload pdf (schedule and abstracts)

Panel 1: Methods

Conveners/chairs: Frank Seifart, Peter Wittenburg, Daan Broeder

Central questions:

  • How do computational methods impact the annotation and analysis of language documentation data, and other linguistic data?
  • How do computational methods developed for large corpora of well-known languages apply to the usually relatively small language documentation corpora of usually less well-known languages?
  • What are the prospects and limitations of across-corpora cross-linguistic computationally aided research?


  • e-Grammars and endangered languages corpora
    • Sebastian Drude (DobeS)
  • typologists' perspective on quantitative analyses with R using language documentation corpora
    • Balthasar Bickel (U Zürich/DobeS)
  • further computational issues in connection with DobeS corpora
    • Jost Gippert (DobeS)
  • general corpus linguistic's perspective, the challenge of small corpora
    • Anke Lüdeling (HU Berlin)
  • statistical processing and learning methods for automatic text analysis and annotation of small corpora, i.e. a computer sciences' perspective on quantitative methods for DobeS corpora
    • Gerhard Heyer (U Leipzig)
  • automatic audio/video recognition
    • Daniel Schneider (NetMedia department at Fraunhofer IAIS, Bonn)
    • Oliver Schreer (Fraunhofer-Institute for Telecommunications, Berlin)
  • linguists' perspective on innovative quantitative methods for typological research based on quantitative analyses of texts
    • Michael Cysouw (LMU München)


Panel 2: Analyses

Conveners/chairs: Anna Margetts, Geoffrey Haig, Nikolaus Himmelmann

Central question:

  • What impact has language documentation had on analyses and theorizing in linguistics and related disciplines and how can it make further impacts?


The DoBeS archive, as a typological database, contains by definition data from endangered, hence generally small and often geographically isolated, language communities. This database thus counteracts the often-bemoaned bias in linguistic typology and other disciplines towards “large” languages, i.e. those that are embodied through a standardized written form, promulgated through education and used for decontextualized communication purposes in industrialized societies. Some researchers (e.g. Trudgill) have speculated whether such languages display distinct typological features (as opposed to “small” languages). More importantly perhaps, the data in the DoBeS archive are usually multimodal (including both audio and video recordings) and thus open up research possibilities for these languages which to date have been largely confined to “large” (typically national or at least regional) languages.

What we are specifically interested in here are:

  • new (or at least considerably modified) research methodologies and questions that are supported by such data (besides multimodality, e.g. the possibilities opened up by systematically exploiting metadata)
  • specific claims and hypotheses which crucially involved documentation data
  • impact of verifiability of analyses by making it possible to access raw and primary data directly; competing analyses based on this possibility


  • Typology of referential hierarchies (as an example of classical typological topics)
    • Stefan Schnell (CA Kiel/DobeS)
    • Jane Simpson (U Sydney)
  • Language acquisition, gesture
    • Sabine Stoll (U Zürich/DobeS)
    • Marianne Gullberg (Lund U, Sweden)
  • Linguistic Anthropology & Socio-linguistics (e.g. who is talking what language to whom, …)
    • Peter Trudgill (U Agder)
  • Historical and contact linguistics (e.g. what can be extracted from text corpora that is interesting for historical and contact linguistics)
    • Marian Klamer (U Leiden, NL/EUROBABEL)


Panel 3: Utilization

Conveners/chairs: Dagmar Jung, Paul Trilsbeek

Central questions:

  • How can language documentation data be utilized in a broader context?
  • How must these digital data be stored, represented, and made accessible by the archives?
  • What kinds of uses will evolve in the context of the social media?


Data on endangered languages are not only valuable for linguists, but present a repository of cultural and linguistic knowledge that can and will be used for language maintenance efforts.


  • Data curation and preservation
    • Nick Thieberger (PARADISEC/U Melbourne, Australia)
    • Gary Holton (ANLC, Alaska, USA)
  • Online presentation and accessibility of endangered languages data
    • Hans-Jörg Bibiko (MPI-EVA, Leipzig)
    • Gabriele Schwiertz (DobeS/U Köln)
  • Creating educational materials from language documentation data
    • Ulrike Mosel (DobeS/CAU Kiel)
  • Language planning
    • Julia Sallabank (SOAS, London)