Rethinking Phonotactic Complexity

Tiago Pimentel, Brian Roark, Ryan Cotterell


Abstract
In this work, we propose the use of phone-level language models to estimate phonotactic complexity—measured in bits per phoneme—which makes cross-linguistic comparison straightforward. We compare the entropy across languages using this simple measure, gaining insight on how complex different language’s phonotactics are. Finally, we show a very strong negative correlation between phonotactic complexity and the average length of words—Spearman rho=-0.744—when analysing a collection of 106 languages with 1016 basic concepts each.
Anthology ID:
W19-3628
Volume:
Proceedings of the 2019 Workshop on Widening NLP
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Amittai Axelrod, Diyi Yang, Rossana Cunha, Samira Shaikh, Zeerak Waseem
Venue:
WiNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
88–90
Language:
URL:
https://aclanthology.org/W19-3628/
DOI:
Bibkey:
Cite (ACL):
Tiago Pimentel, Brian Roark, and Ryan Cotterell. 2019. Rethinking Phonotactic Complexity. In Proceedings of the 2019 Workshop on Widening NLP, pages 88–90, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Rethinking Phonotactic Complexity (Pimentel et al., WiNLP 2019)
Copy Citation: