Workshop on Cross-Linguistic Data Formats 2023– Graphs and Text
Ten years ago, in early 2013, the Cross-Linguistic Linked Data project (CLLD) was kicked off at the MPI EVA. It soon became one of the main drivers of an initiative towards standardization of cross-linguistic data, culminating in the CLDF specification in 2018, which grew out of a series of workshops on “Language Comparison with Linguistic Databases” (see here). Looking back at this history, we want to explore what role CLDF can play in the future of the field and if targeted workshops are useful to govern the standard.
This workshop brings together researchers using cross-linguistic data, publishers of such data and tool builders, i.e. representatives of the community from which the CLDF standard grew and for which it is intended.
The main goals during the two workshop days are
an update on what’s happening regarding cross-linguistic, data-intensive research, concentrating specifically on graphs and texts as two new major goals of standardization that we intend to tackle soon, and
a shared understanding of the role CLDF (or standardization in general) can play in research (ideally resulting in a “community of practice”, driving the future of standardization efforts).
Cross-Linguistic Data-Intensive Research
Several large-scale data collection projects have come to fruition over the last decade (List et al. 2022; Skirgård et al. 2023), providing ever more input for research that takes into account the world’s linguistic diversity.
Inferring phylogenetic trees from cognate-coded lexical data (Sagart et al. 2019; Greenhill et al. 2023; Heggarty et al. 2023) can probably be regarded as the de-facto standard application of such data, but research in psychology using colexification networks (Jackson et al. 2019; Brochhagen et al. 2023) or the quest for language universals (Dediu 2023) are just two more examples for research that routinely uses cross-linguistic data.
During this workshop, we hope to learn about more research questions that require cross-linguistic data to be answered.
The Role of Standards
Standardization may sometimes be perceived as contra-productive in research, because it seems essentially at odds with “cutting-edge” methodology or individual researchers’ intuition and freedom of thought (Bauman 2011). But clearly, standing on the shoulders of giants becomes easier, when solid steps lead there.
During the workshop, we hope to identify steps that – in retrospect – lead in the right direction and understand which current research paths are well-trodden enough to become candidates for further standardization.
Next Steps Towards Expanding CLDF
Finally, the workshop will serve as an experiment in figuring out how to govern a standard like CLDF. In the best case, the next version of CLDF will be shaped by requirements gathered, lessons learned, and opportunities identified during the workshop.
References
Bauman, Syd. 2011. “Interchange Vs. Interoperability.” In Proceedings of Balisage: The Markup Conference 2011. Mulberry Technologies. https://doi.org/10.4242/balisagevol7.bauman01.
Brochhagen, Thomas, Gemma Boleda, Eleonora Gualdoni, and Yang Xu. 2023. “From Language Development to Language Evolution: A Unified View of Human Lexical Creativity.” Science 381 (6656): 431–36. https://doi.org/10.1126/science.ade7981.
Dediu, Dan. 2023. “Ultraviolet Light Affects the Color Vocabulary: Evidence from 834 Languages.” Frontiers in Psychology 14. https://doi.org/10.3389/fpsyg.2023.1143283.
Greenhill, Simon J., Hannah J. Haynie, Robert M. Ross, Angela Chira, Johann-Mattis List, Lyle Campbell, Carlos A. Botero, and Russell D. Gray. 2023. “A Recent Northern Origin for the Uto-Aztecan Family.” Language 0 (0).
Heggarty, Paul, Cormac Anderson, Matthew Scarborough, Benedict King, Remco Bouckaert, Lechosław Jocz, Martin Joachim Kümmel, et al. 2023. “Language Trees with Sampled Ancestors Support a Hybrid Model for the Origin of Indo-European Languages.” Science 381 (6656). https://doi.org/10.1126/science.abg0818.
Jackson, Joshua Conrad, Joseph Watts, Teague R. Henry, Johann-Mattis List, Peter J. Mucha, Robert Forkel, Simon J. Greenhill, Russell D. Gray, and Kristen Lindquist. 2019. “Emotion Semantics Show Both Cultural Variation and Universal Structure.”Science 366 (6472): 1517–22.
List, Johann-Mattis, Robert Forkel, Simon J. Greenhill, Christoph Rzymski, Johannes Englisch, and Russell D. Gray. 2022. “Lexibank, a Public Repository of Standardized Wordlists with Computed Phonological and Lexical Features.” Scientific Data 9 (316): 1–31.
Sagart, Laurent, Guillaume Jacques, Yunfan Lai, Robin Ryder, Valentin Thouzeau, Simon J. Greenhill, and Johann-Mattis List. 2019. “Dated Language Phylogenies Shed Light on the Ancestry of Sino-Tibetan.”Proceedings of the National Academy of Science of the United States of America 116: 10317–22.
Skirgård, Hedvig, Hannah J. Haynie, Damián E. Blasi, Harald Hammarström, Jeremy Collins, Jay J. Latarche, Jakob Lesage, et al. 2023. “Grambank Reveals the Importance of Genealogical Constraints on Linguistic Diversity and Highlights the Impact of Language Loss.”Science Advances 9 (16). https://doi.org/10.1126/sciadv.adg6175.
Invited participants
(listed in alphabetical order by participant's last name)
- Sascha Alexeyenko
- Laura Becker
- Christian Bentz
- Katja Bocklage
- Thomas Brochhagen
- Anna Di Natale
- Promis Dodzi Kpoglu
- Jeff Good
- John Mansfield
- Barbara Meierernst
- Jessica Nieder
- Sebastian Nordhoff
- Matthias Pache
- Michele Pullini
- Arne Rubehn
Program
Thursday, December 14
08:40 - 09:00 | everyone | registration: picking up name tags, lunch & reception tickets |
09:00 - 09:30 | Johann-Mattis List & Robert Forkel | Introduction (slides) |
09:30 - 10:00 | Sascha Alexeyenko | CLLD apps as a tool for the construction of datasets (slides) |
10:00 - 10:30 | Jeff Good | Extending CLDF to multilingual data (slides) |
10:30 - 11:00 | COFFEE BREAK | |
11:00 - 11:30 | John Mansfield | Areal colexification and partial colexification in northern Australia (slides) |
11:30 - 12:00 | Thomas Brochhagen | Challenges and insights from cross-linguistic word-meaning associations: A roadmap for the study of loose colexification (slides) |
12:00 - 12:30 | Annika Tjuka & Johann-Mattis List | Representing semantic networks in Concepticon (slides) |
12:30 - 14:00 | LUNCH BREAK | |
14:00 - 14:30 | Christian Bentz | Collecting character sequences for paleolithic signs and written languages |
14:30 - 15:00 | Sebastian Nordhoff | Generating CLDF from heterogenenous input in the Open Text Collections project: Input from FLEx, ELAN, tex |
15:00 - 15:30 | Barbara Meisterernst | The morpho syntax of Archaic Chinese verbs: Loss of morphology as trigger for the emergence of analytic structures (slides) |
15:30 - 16:00 | COFFEE BREAK | |
16:00 - 17:30 | Practice Session 1 | Text formats in CLDF |
17:30 - 20:00 | WORKSHOP RECEPTION |
Friday, December 15
09:00 - 10:30 | Practice Session 2 | CLICS4 and networks in CLDF |
10:30 - 11:00 | COFFEE BREAK | |
11:00 - 12:30 | everyone | Discussion |
12:30 - 13:30 | LUNCH BREAK | |
13:30 - 14:00 | everyone | Wrapping up |
You can download our book of abstracts here.
Registration
Deadline for registration was November 30th, 2023.
Organizers
Contact
Questions? Please send any queries regarding this workshop to us here.