Unzipping the Causality of Zipf’s Law and Other Lexical Trade-offs

Amanda Doucette, Timothy J. O’Donnell, Morgan Sonderegger


Abstract
There are strong constraints on the structure of a possible lexicon. For example, the negative correlation between word frequency and length known as Zipf’s law, and a negative correlation between word length and phonotactic complexity appear to hold across languages. While lexical trade-offs like these have been examined individually, it is unclear how they interact as a system. In this paper, we propose causal discovery as a method for identifying lexical biases and their interactions in a set of variables. We represent the lexicon as a causal model, and apply the Fast Causal Discovery algorithm (Spirtes et al., 1995) to identify both causal relationships between measured variables and the existence of possible unmeasured confounding variables. We apply this method to lexical data including measures of word length, frequency, phonotactic complexity, and morphological irregularity for 25 languages and find evidence of universal associations involving word length with a high likelihood of involving an unmeasured confounder, suggesting that additional variables need to be measured to determine how they are related. We also find evidence of variation across languages in relationships between the remaining variables, and suggest that given a larger dataset, causal discovery algorithms can be a useful tool in assessing the universality of lexical biases.
Anthology ID:
2025.cmcl-1.11
Volume:
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, USA
Editors:
Tatsuki Kuribayashi, Giulia Rambelli, Ece Takmaz, Philipp Wicke, Jixing Li, Byung-Doh Oh
Venues:
CMCL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
66–76
Language:
URL:
https://aclanthology.org/2025.cmcl-1.11/
DOI:
Bibkey:
Cite (ACL):
Amanda Doucette, Timothy J. O’Donnell, and Morgan Sonderegger. 2025. Unzipping the Causality of Zipf’s Law and Other Lexical Trade-offs. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 66–76, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Unzipping the Causality of Zipf’s Law and Other Lexical Trade-offs (Doucette et al., CMCL 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.cmcl-1.11.pdf