Even though verbs may be more complex than nouns, nouns thus appear to require artificial intelligence report planning, probably due to the new information they usually represent.

This finding points to strong universals in how humans process language and manage referential information when communicating linguistically. Every bit of spoken language is produced at a particular speed. However, this speed is not constant-speakers speed up and slow down.

Variation in speech rate is influenced by a complex combination of factors, including the frequency and predictability of words, their information status, and their position within an utterance. Here, we use speech rate as an index of word-planning effort and focus on the time window during which speakers prepare the production of words from the two major lexical classes, nouns and verbs. We show that, when naturalistic speech is sampled from languages from over the world, there is a robust cross-linguistic tendency for slower speech before nouns compared with verbs, both in terms of slower articulation and more pauses.

We attribute this slowdown effect to the increased amount of planning that nouns require compared with verbs. These conditions on noun use appear to outweigh potential advantages stemming from differences in internal complexity between nouns and verbs. Our findings suggest that, beneath the staggering diversity of grammatical structures and cultural settings, there are robust universals of language processing that are intimately tied to how speakers manage referential information when they communicate with one another.

Human language in its most widespread form (i.e., spoken language) is linear. This was recognized by the founding father of modern linguistics, Ferdinand de Saussure, as one of the two fundamental principles of the linguistic sign, the other one being its arbitrary nature.

An unresolved question is which aspects of local variation in speech rate are universal, which vary across languages and cultures, and which vary across individuals. For example, marking the end of utterances by slowing down speech is common, but its implementation is language-specific.

Good candidates for truly universal temporal features are the relatively fast pronunciations of frequent, and thus predictable, words and repeated mentions of words. This speedup is argued to result from automated articulation and has been suggested to contribute to efficient communication by spreading information more evenly across the speech signal.

An aspect of speech rate that has received less attention is the local speech rate during the planning, rather than the actual pronunciation, of words. Speed variation before the articulatory onset of a word can provide direct evidence for cognitive processes.

Here, we investigate speech rate in word-planning windows in naturalistic speech from nine languages to assess differences in the two major word classes usually found in languages: nouns and verbs. To our knowledge, the relative speedup or slowdown of speech preceding nouns versus verbs has never been systematically studied. Related research like response times in picture-naming experiments suggest that nouns require less planning time than verbs.

This is attributed to increased processing costs of verbs because of their relative grammatical and semantic complexity and their links with other elements in the clause, for example, subjects and objects.

A factor that has been neglected in this research is how referential information is managed in connected, interactive speech. In running speech, the choice between referring expressions (e.g., pronouns vs. full noun phrases) is constrained by the information status of referents.

What emerges as a cross-linguistically stable pattern, however, is that the use of nouns typically signals the newness of a referent. Verbs are fundamentally different in this regard: Even if the same actions or states are referred to repeatedly, a verb is typically still necessary to form a complete sentence.

While the generic nature of certain verbs (e.g., do) allows them to be used as pro-verbs in some languages, this is subject to special syntactic constraints. Similarly, verbs can sometimes be gapped in some languages (e.g., John drank wine and Mary beer), but this is again subject to special syntactic constraints.

In general, the use of verbs is thus the default option, regardless of the information status of the actions or states referred to, while the use of nouns is a marked option that is felicitous only in contexts of information novelty, disambiguation needs, or topic and perspective shifts.

Given these additional constraints on the use of nouns, their use should correlate with a higher planning cost, slowing down speech before the noun. Here, we aim to settle not only the question of the direction of the effect of subsequent noun versus verb use on speech rate, but also its universality. For this we use time-aligned corpora of naturalistic speech from multimedia language documentations.

These seven corpora were compiled during on-site fieldwork over the past 25 y and were transcribed, translated, and annotated with word class tags by experts on the languages in collaboration with native speakers. They document naturalistic speech of various genres, including narratives, descriptive texts, and conversations, that were recorded in their original, interactive settings, such as the recording of a Bora myth illustrated in Fig.

While the genres covered by the corpora are diverse, all data are comparable in that they document speech which is spontaneously produced, not read out or memorized, even if texts stem from local oral traditions. We additionally used relevant sections of published corpora of spoken Dutch and English, which likewise document naturalistic spoken language annotated for word class. Location of the nine languages and size of the corpora studied here.

For detailed information, see SI Appendix, Table S1. Bora example illustrating slow articulation and presence of a pause before a noun compared with fast articulation and no pause before a verb. Procedures for time-aligning transcriptions and for determining position, pause length, and context window size are described in Materials and Methods.

This window size was set following picture- and action-naming studies that have shown that planning a single content word takes around 600 ms. We analyzed both measures with generalized linear mixed-effects models with the word class (noun vs. verb) as the main predictor. Our models furthermore took into account random effects caused by idiosyncrasies of individual speakers, recording sessions, and individual word forms. Inclusion of word forms takes care of the expected speedup associated with frequent and predictable items, since frequency and predictability are properties of individual word forms (Materials and Methods).



