Using Graph-Based Methods to Augment Online Dictionaries of Endangered Languages

Khalid Alnajjar, Mika Hämäläinen, Niko Tapio Partanen, Jack Rueter


Abstract
Many endangered Uralic languages have multilingual machine readable dictionaries saved in an XML format. However, the dictionaries cover translations very inconsistently between language pairs, for instance, the Livonian dictionary has some translations to Finnish, Latvian and Estonian, and the Komi-Zyrian dictionary has some translations to Finnish, English and Russian. We utilize graph-based approaches to augment such dictionaries by predicting new translations to existing and new languages based on different dictionaries for endangered languages and Wiktionaries. Our study focuses on the lexical resources for Komi-Zyrian (kpv), Erzya (myv) and Livonian (liv). We evaluate our approach by human judges fluent in the three endangered languages in question. Based on the evaluation, the method predicted good or acceptable translations 77% of the time. Furthermore, we train a neural prediction model to predict the quality of the automatically predicted translations with an 81% accuracy. The resulting extensions to the dictionaries are made available on the online dictionary platform used by the speakers of these languages.
Anthology ID:
2022.computel-1.18
Volume:
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Sarah Moeller, Antonios Anastasopoulos, Antti Arppe, Aditi Chaudhary, Atticus Harrigan, Josh Holden, Jordan Lachler, Alexis Palmer, Shruti Rijhwani, Lane Schwartz
Venue:
ComputEL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
139–148
Language:
URL:
https://aclanthology.org/2022.computel-1.18
DOI:
10.18653/v1/2022.computel-1.18
Bibkey:
Cite (ACL):
Khalid Alnajjar, Mika Hämäläinen, Niko Tapio Partanen, and Jack Rueter. 2022. Using Graph-Based Methods to Augment Online Dictionaries of Endangered Languages. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 139–148, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Using Graph-Based Methods to Augment Online Dictionaries of Endangered Languages (Alnajjar et al., ComputEL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.computel-1.18.pdf