Phone Inventories and Recognition for Every Language

Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black, Shinji Watanabe


Abstract
Identifying phone inventories is a crucial component in language documentation and the preservation of endangered languages. However, even the largest collection of phone inventory only covers about 2000 languages, which is only 1/4 of the total number of languages in the world. A majority of the remaining languages are endangered. In this work, we attempt to solve this problem by estimating the phone inventory for any language listed in Glottolog, which contains phylogenetic information regarding 8000 languages. In particular, we propose one probabilistic model and one non-probabilistic model, both using phylogenetic trees (“language family trees”) to measure the distance between languages. We show that our best model outperforms baseline models by 6.5 F1. Furthermore, we demonstrate that, with the proposed inventories, the phone recognition model can be customized for every language in the set, which improved the PER (phone error rate) in phone recognition by 25%.
Anthology ID:
2022.lrec-1.114
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1061–1067
Language:
URL:
https://aclanthology.org/2022.lrec-1.114
DOI:
Bibkey:
Cite (ACL):
Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black, and Shinji Watanabe. 2022. Phone Inventories and Recognition for Every Language. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1061–1067, Marseille, France. European Language Resources Association.
Cite (Informal):
Phone Inventories and Recognition for Every Language (Li et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.114.pdf