Nachyn-Enkhedorzhu Oorzhak
2024
Enhancing Tuvan Language Resources through the FLORES Dataset
Ali Kuzhuget
|
Airana Mongush
|
Nachyn-Enkhedorzhu Oorzhak
Proceedings of the Ninth Conference on Machine Translation
FLORES is a benchmark dataset designed for evaluating machine translation systems, partic- ularly for low-resource languages. This paper, conducted as a part of Open Language Data Ini- tiative (OLDI) shared task, presents our contri- bution to expanding the FLORES dataset with high-quality translations from Russian to Tu- van, an endangered Turkic language. Our ap- proach combined the linguistic expertise of na- tive speakers to ensure both accuracy and cul- tural relevance in the translations. This project represents a significant step forward in support- ing Tuvan as a low-resource language in the realm of natural language processing (NLP) and machine translation (MT).