2025
pdf
bib
abs
Unzipping the Causality of Zipf’s Law and Other Lexical Trade-offs
Amanda Doucette
|
Timothy J. O’Donnell
|
Morgan Sonderegger
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
There are strong constraints on the structure of a possible lexicon. For example, the negative correlation between word frequency and length known as Zipf’s law, and a negative correlation between word length and phonotactic complexity appear to hold across languages. While lexical trade-offs like these have been examined individually, it is unclear how they interact as a system. In this paper, we propose causal discovery as a method for identifying lexical biases and their interactions in a set of variables. We represent the lexicon as a causal model, and apply the Fast Causal Discovery algorithm (Spirtes et al., 1995) to identify both causal relationships between measured variables and the existence of possible unmeasured confounding variables. We apply this method to lexical data including measures of word length, frequency, phonotactic complexity, and morphological irregularity for 25 languages and find evidence of universal associations involving word length with a high likelihood of involving an unmeasured confounder, suggesting that additional variables need to be measured to determine how they are related. We also find evidence of variation across languages in relationships between the remaining variables, and suggest that given a larger dataset, causal discovery algorithms can be a useful tool in assessing the universality of lexical biases.
2024
pdf
bib
Correlation Does Not Imply Compensation: Complexity and Irregularity in the Lexicon
Amanda Doucette
|
Ryan Cotterell
|
Morgan Sonderegger
|
Timothy J. O’Donnell
Proceedings of the Society for Computation in Linguistics 2024
2022
pdf
bib
abs
Multidimensional acoustic variation in vowels across English dialects
James Tanner
|
Morgan Sonderegger
|
Jane Stuart-Smith
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Vowels are typically characterized in terms of their static position in formant space, though vowels have also been long-known to undergo dynamic formant change over their timecourse. Recent studies have demonstrated that this change is highly informative for distinguishing vowels within a system, as well as providing additional resolution in characterizing differences between dialects. It remains unclear, however, how both static and dynamic representations capture the main dimensions of vowel variation across a large number of dialects. This study examines the role of static, dynamic, and duration information for 5 vowels across 21 British and North American English dialects, and observes that vowels exhibit highly structured variation across dialects, with dialects displaying similar patterns within a given vowel, broadly corresponding to a spectrum between traditional ‘monophthong’ and ‘diphthong’ characterizations. These findings highlight the importance of dynamic and duration information in capturing how vowels can systematically vary across a large number of dialects, and provide the first large-scale description of formant dynamics across many dialects of a single language.
2013
pdf
bib
Gender Inference of Twitter Users in Non-English Contexts
Morgane Ciot
|
Morgan Sonderegger
|
Derek Ruths
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
2010
pdf
bib
Combining Data and Mathematical Models of Language Change
Morgan Sonderegger
|
Partha Niyogi
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics