Amanda Doucette
2025
Unzipping the Causality of Zipf’s Law and Other Lexical Trade-offs
Amanda Doucette
|
Timothy J. O’Donnell
|
Morgan Sonderegger
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
There are strong constraints on the structure of a possible lexicon. For example, the negative correlation between word frequency and length known as Zipf’s law, and a negative correlation between word length and phonotactic complexity appear to hold across languages. While lexical trade-offs like these have been examined individually, it is unclear how they interact as a system. In this paper, we propose causal discovery as a method for identifying lexical biases and their interactions in a set of variables. We represent the lexicon as a causal model, and apply the Fast Causal Discovery algorithm (Spirtes et al., 1995) to identify both causal relationships between measured variables and the existence of possible unmeasured confounding variables. We apply this method to lexical data including measures of word length, frequency, phonotactic complexity, and morphological irregularity for 25 languages and find evidence of universal associations involving word length with a high likelihood of involving an unmeasured confounder, suggesting that additional variables need to be measured to determine how they are related. We also find evidence of variation across languages in relationships between the remaining variables, and suggest that given a larger dataset, causal discovery algorithms can be a useful tool in assessing the universality of lexical biases.
2024
Correlation Does Not Imply Compensation: Complexity and Irregularity in the Lexicon
Amanda Doucette
|
Ryan Cotterell
|
Morgan Sonderegger
|
Timothy J. O’Donnell
Proceedings of the Society for Computation in Linguistics 2024
2017
Inherent Biases of Recurrent Neural Networks for Phonological Assimilation and Dissimilation
Amanda Doucette
Proceedings of the 7th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2017)
A recurrent neural network model of phonological pattern learning is proposed. The model is a relatively simple neural network with one recurrent layer, and displays biases in learning that mimic observed biases in human learning. Single-feature patterns are learned faster than two-feature patterns, and vowel or consonant-only patterns are learned faster than patterns involving vowels and consonants, mimicking the results of laboratory learning experiments. In non-recurrent models, capturing these biases requires the use of alpha features or some other representation of repeated features, but with a recurrent neural network, these elaborations are not necessary.