Workshop on Cross-Linguistic Data Formats 2023– Graphs and Text

Ten years ago, in early 2013, the Cross-Linguistic Linked Data project (CLLD) was kicked off at the MPI EVA. It soon became one of the main drivers of an initiative towards standardization of cross-linguistic data, culminating in the CLDF specification in 2018, which grew out of a series of workshops on “Language Comparison with Linguistic Databases” (see here). Looking back at this history, we want to explore what role CLDF can play in the future of the field and if targeted workshops are useful to govern the standard.

This workshop brings together researchers using cross-linguistic data, publishers of such data and tool builders, i.e. representatives of the community from which the CLDF standard grew and for which it is intended.

The main goals during the two workshop days are

  • an update on what’s happening regarding cross-linguistic, data-intensive research, concentrating specifically on graphs and texts as two new major goals of standardization that we intend to tackle soon, and

  • a shared understanding of the role CLDF (or standardization in general) can play in research (ideally resulting in a “community of practice”, driving the future of standardization efforts).

Cross-Linguistic Data-Intensive Research

Several large-scale data collection projects have come to fruition over the last decade (List et al. 2022; Skirgård et al. 2023), providing ever more input for research that takes into account the world’s linguistic diversity.

Inferring phylogenetic trees from cognate-coded lexical data (Sagart et al. 2019; Greenhill et al. 2023; Heggarty et al. 2023) can probably be regarded as the de-facto standard application of such data, but research in psychology using colexification networks (Jackson et al. 2019; Brochhagen et al. 2023) or the quest for language universals (Dediu 2023) are just two more examples for research that routinely uses cross-linguistic data.

During this workshop, we hope to learn about more research questions that require cross-linguistic data to be answered.

The Role of Standards

Standardization may sometimes be perceived as contra-productive in research, because it seems essentially at odds with “cutting-edge” methodology or individual researchers’ intuition and freedom of thought (Bauman 2011). But clearly, standing on the shoulders of giants becomes easier, when solid steps lead there.

During the workshop, we hope to identify steps that – in retrospect – lead in the right direction and understand which current research paths are well-trodden enough to become candidates for further standardization.

Next Steps Towards Expanding CLDF

Finally, the workshop will serve as an experiment in figuring out how to govern a standard like CLDF. In the best case, the next version of CLDF will be shaped by requirements gathered, lessons learned, and opportunities identified during the workshop.


Invited participants

(listed in alphabetical order by participant's last name)

Thursday, December 14

08:40 - 09:00everyoneregistration: picking up name tags, lunch & reception tickets
09:00 - 09:30Johann-Mattis List & Robert ForkelIntroduction (slides)
09:30 - 10:00Sascha AlexeyenkoCLLD apps as a tool for the construction of datasets (slides)
10:00 - 10:30Jeff GoodExtending CLDF to multilingual data (slides)
10:30 - 11:00COFFEE BREAK 
11:00 - 11:30John Mansfield Areal colexification and partial colexification in northern Australia (slides)
11:30 - 12:00Thomas BrochhagenChallenges and insights from cross-linguistic word-meaning associations: A roadmap for the study of loose colexification (slides)
12:00 - 12:30Annika Tjuka & Johann-Mattis ListRepresenting semantic networks in Concepticon (slides)
12:30 - 14:00LUNCH BREAK 
14:00 - 14:30Christian Bentz Collecting character sequences for paleolithic signs and written languages
14:30 - 15:00Sebastian NordhoffGenerating CLDF from heterogenenous input in the Open Text Collections project: Input from FLEx, ELAN, tex
15:00 - 15:30Barbara Meisterernst The morpho syntax of Archaic Chinese verbs: Loss of morphology as trigger for the emergence of analytic structures (slides)
15:30 - 16:00COFFEE BREAK 
16:00 - 17:30Practice Session 1Text formats in CLDF

Friday, December 15

09:00 - 10:30Practice Session 2CLICS4 and networks in CLDF
10:30 - 11:00COFFEE BREAK 
11:00 - 12:30everyoneDiscussion
12:30 - 13:30LUNCH BREAK 

13:30 - 14:00


Wrapping up

You can download our book of abstracts here.


