Participants

Project leader

Frank Seifart

Project members

Hans-Jörg Bibiko
Balthasar Bickel
Swintha Danielsen
Roland Meyer
Sebastian Nordhoff
Brigitte Pakendorf
Jan Strunk
Alena Witzlack-Makarevich
Taras Zakharko

Research assistants

Helen Geyer
Lisa Steinbach
Evgeniya Zhivotova

The relative frequencies of nouns, pronouns, and verbs cross-linguistically

(Volkswagen Foundation DoBeS grant 86 292)

This project investigated the relative frequencies of core parts of speech, such as nouns, verbs, and pronouns, in spoken language corpora of seven languages that represent a wide range of areal and typological diversity. We focused on two research questions:

Why do languages vary so drastically in the relative frequencies of noun, pronoun, and verb tokens employed in discourse? Our pilot study for this project suggested that in some languages (such as Chintang) the overall number of nouns and pronouns taken together roughly equals the overall number of verbs, while in others (such as Sri Lanka Malay) this ratio is twice as high, i.e., the overall number of nouns and pronouns taken together is roughly double the overall number of verbs. What typological or other differences between languages can explain these differences in the use of parts of speech? One of the hypotheses we tested was the presence of argument indexing on verbs, which may make the overt realization of arguments as nouns or pronouns unnecessary, and may thus explain the low frequencies of nouns and pronouns in some languages.
Why do the relative frequencies of nouns, pronouns, and verbs vary within texts? Our pilot study has shown that—consistently across languages—at the beginning of narrative texts, nouns are particularly frequently used, reflecting the introduction of new discourse participants, as expected. Furthermore, there were characteristic, sinusoidal alternations in the frequencies of noun use as narrative texts unfold, with regular peaks of heavy noun use roughly every 10-15 clauses. These peaks may reflect universal cognitive constraints on the activation of discourse participants, which necessitate their re-introduction by full lexical nouns after their activation has decayed, ultimately due to constraints of short-term memory.

We also investigated the influence of further factors on the relative frequencies of nouns, pronouns, and verbs, such as the degree of speakers’ and listeners’ mutual acquaintance (known/familiar vs. unknown) and text genres. In this context we empirically tested the assumed universality of ‘nouniness’ of formal genres.

The newly available data compiled in the DoBeS framework allowed us to develop and then appropriately address these research questions for the first time, as they require data from diverse languages that are annotated for parts of speech by experts, time-aligned, and described with detailed metadata with respect to speakers’ social status, mutual acquaintance, etc. These data allowed us to capture subtle language usage patterns and explore their relation to typological differences between languages, narrative strategies, and other linguistic and non-linguistic factors. This project thus further developed documentary linguistics, connecting it with areas such as corpus linguistics, morphological typology, syntactic theory, discourse studies, and cognitive linguistics. In order to connect our findings with research on well-known languages such as English, we additionally carried out analyses on published corpora of English.

The methods applied include computational techniques for quantitative analysis of textual data of the type that has been produced by DoBeS projects, with as little additional manual annotation of data as possible. This permited us to analyze the huge amount of data necessary to detect and appropriately describe the subtle patterns under investigation. It involved developing solutions for a number of technological and computational issues for cross-corpora studies, as additional outcomes of this project.

Languages

Language name	Language family	Region	Number of speakers	Language Expert
Baure	Arawakan	Amazonia	84	Swintha Danielsen
Chintang	Tibeto-Burman	Himalaya	~ 1,500	Balthasar Bickel
Bora	Boran	Amazonia	~ 1,500	Frank Seifart
N\|uu	Southern Khoisan	South Africa	6	Alena Witzlack-Makarevich
Sri Lanka Malay	Austronesian	Sri Lanka	~ 45,000	Sebastian Nordhoff
Ėven	Tungusic	Siberia	~ 2,500	Brigitte Pakendorf
Sakha (Yakut)	Turkic	Siberia	~ 360,000	Brigitte Pakendorf

Publications

Seifart, Frank, Roland Meyer, Taras Zakharko, Balthasar Bickel, Swintha Danielsen, Sebastian Nordhoff, and Alena Witzlack-Makarevich. 2010. Cross-linguistic variation in the noun-to-verb ratio: Exploring automatic tagging and quantitative corpus analysis. Paper presented at the DobeS Workshop “Advances in Documentary Linguistics” Nijmegen, 14-15 October 2010.

Seifart, Frank 2011. Cross-linguistic variation in the noun-to-verb ratio: the role of verb morphology and narrative strategies. Poster presented at the Association for Linguistic Typology 9th Biennial Conference, The University of Hong Kong, July 21-24, 2011. (pdf)

Events

The relative frequencies of nouns, pronouns, and verbs in discourse An international workshop. Leipzig, August 12-13, 2013.

Related project: Referentiality Project at Universität Erfurt
Funding agency: Volkswagen Foundation
Funding scheme: DOBES (Documentation bedrohter Sprachen)

Participants

The relative frequencies of nouns, pronouns, and verbs cross-linguistically

Languages

Publications

Events

Max Planck Institute for Evolutionary Anthropology

Quick Links

Departments and Groups

Participants

The relative frequencies of nouns, pronouns, and verbs cross-linguistically

Languages

Publications

Events

Max Planck Institute for Evolutionary Anthropology

Quick Links