Workshop description

In the past 10 years, intensive documentation activities, i.e. compilations of large, multimedia corpora of spoken endangered languages, much of it within the DobeS framework, have contributed to documenting important linguistic and cultural aspects of dozens of languages. The rich experience gained through these initiatives has provided us with a far better understanding of the enormous potentials of language documentation.

These potentials are threefold:

Given that modern, DobeS language documentations are cast in a sufficiently standardized, well-structured electronic format, computational methods can efficiently enhance the annotations and improve the analyses of language documentation data in many ways. The combination of state-of-the-art computational methods and electronic language documentation corpora has the potential to significantly impact the way linguistic data in general is handled and analyzed.
The additional data available through language documentation provides a much better understanding of the range of diversity of human languages. New claims about human language in general need not be based on just a few examples from well-known languages only anymore. Language documentation data have thus the potential to fundamentally change important aspects of how analyses in various subfields of linguistics as well as in related disciplines such as anthropology are pursued since these data provide such investigations with a much richer empirical basis.
The electronic format of language documentation data has the potential for unprecedented options for varied ways of utilization not only for research but also for language maintenance efforts and to raise awareness of language diversity and endangerment. This has a historical importance comparable to gene banks in documenting outcomes of long evolutionary processes, and is especially important when a documented endangered language has become extinct. Web 2.0 technology has the potential to allow not only one-way access but also rich interaction with data, while at the same time safeguarding the integrity of the original data.

These three promising areas have recently begun to be explored to various degrees. Based on these experiences, we can now critically discuss and make more explicit the threefold potentials of language documentation in order to, firstly, foster further interactions of language documentation with computational methodologies, secondly, to encourage further analyses, especially interdisciplinary research, using language documentation data, and thirdly, to discuss the various ways of utilizing language documentation data, for practical applications and other uses.

By clarifying the potentials of language documentation this workshop will also contribute to the question of how language documentations, as products, can be defined, beyond being a collection of methods and technologies.