Morphological Irregularity Correlates with Frequency

Shijie Wu, Ryan Cotterell, Timothy O’Donnell


Abstract
We present a study of morphological irregularity. Following recent work, we define an information-theoretic measure of irregularity based on the predictability of forms in a language. Using a neural transduction model, we estimate this quantity for the forms in 28 languages. We first present several validatory and exploratory analyses of irregularity. We then show that our analyses provide evidence for a correlation between irregularity and frequency: higher frequency items are more likely to be irregular and irregular items are more likely be highly frequent. To our knowledge, this result is the first of its breadth and confirms longstanding proposals from the linguistics literature. The correlation is more robust when aggregated at the level of whole paradigms—providing support for models of linguistic structure in which inflected forms are unified by abstract underlying stems or lexemes.
Anthology ID:
P19-1505
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5117–5126
Language:
URL:
https://aclanthology.org/P19-1505
DOI:
10.18653/v1/P19-1505
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/P19-1505.pdf
Video:
 https://vimeo.com/385273333
Code
 shijie-wu/neural-transducer