Evaluating the Interplay of Information Status and Information Content in a Multilingual Parallel Corpus

Julius Steuer, Toshiki Nakai, Andrew Thomas Dyer, Luigi Talamo, Annemarie Verkerk


Abstract
The uniform information density (UID) hypothesis postulates that linguistic units are distributed in a text in such a way that the variance around an average information density is minimized. The relationship between information density and information status (IS) is so far underexplored. In this ongoing work, we project IS annotations on the English section of the CIEP+ corpus (Verkerk Talamo 2024) to parallel sections in other languages. We then use the projected annotations to evaluate the relationship between IS and information content in a typologically diverse sample of languages. Our preliminary findings indicate that there is an effect of information status on information density, with the directionality of the effect depending on language and part of speech.
Anthology ID:
2026.sigtyp-main.3
Volume:
Proceedings of the 8th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Ekaterina Vylomova, Andrei Shcherbakov, Priya Rani
Venues:
SIGTYP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18–25
Language:
URL:
https://aclanthology.org/2026.sigtyp-main.3/
DOI:
Bibkey:
Cite (ACL):
Julius Steuer, Toshiki Nakai, Andrew Thomas Dyer, Luigi Talamo, and Annemarie Verkerk. 2026. Evaluating the Interplay of Information Status and Information Content in a Multilingual Parallel Corpus. In Proceedings of the 8th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 18–25, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Evaluating the Interplay of Information Status and Information Content in a Multilingual Parallel Corpus (Steuer et al., SIGTYP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.sigtyp-main.3.pdf