The Kyrgyz Seed Dataset Submission to the WMT25 Open Language Data Initiative Shared Task

Murat Jumashev, Alina Tillabaeva, Aida Kasieva, Turgunbek Omurkanov, Akylai Musaeva, Meerim Emil Kyzy, Gulaiym Chagataeva, Jonathan Washington


Abstract
We present a Kyrgyz language seed dataset as part of our contribution to the WMT25 Open Language Data Initiative (OLDI) shared task. This paper details the process of collecting and curating English–Kyrgyz translations, highlighting the main challenges encountered in translating into a morphologically rich, low-resource language. We demonstrate the quality of the dataset through fine-tuning experiments, showing consistent improvements in machine translation performance across multiple models. Comparisons with bilingual and MNMT Kyrgyz-English baselines reveal that, for some models, our dataset enables performance surpassing pretrained baselines in both English–Kyrgyz and Kyrgyz–English translation directions. These results validate the dataset’s utility and suggest that it can serve as a valuable resource for the Kyrgyz MT community and other related low-resource languages.
Anthology ID:
2025.wmt-1.84
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1088–1102
Language:
URL:
https://aclanthology.org/2025.wmt-1.84/
DOI:
Bibkey:
Cite (ACL):
Murat Jumashev, Alina Tillabaeva, Aida Kasieva, Turgunbek Omurkanov, Akylai Musaeva, Meerim Emil Kyzy, Gulaiym Chagataeva, and Jonathan Washington. 2025. The Kyrgyz Seed Dataset Submission to the WMT25 Open Language Data Initiative Shared Task. In Proceedings of the Tenth Conference on Machine Translation, pages 1088–1102, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
The Kyrgyz Seed Dataset Submission to the WMT25 Open Language Data Initiative Shared Task (Jumashev et al., WMT 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.wmt-1.84.pdf