FLORES+ Translation and Machine Translation Evaluation for the Erzya Language

Isai Gordeev, Sergey Kuldin, David Dale


Abstract
This paper introduces a translation of the FLORES+ dataset into the endangered Erzya language, with the goal of evaluating machine translation between this language and any of the other 200 languages already included into FLORES+. This translation was carried out as a part of the Open Language Data shared task at WMT24. We also present a benchmark of existing translation models bases on this dataset and a new translation model that achieves the state-of-the-art quality of translation into Erzya from Russian and English.
Anthology ID:
2024.wmt-1.49
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
614–623
Language:
URL:
https://aclanthology.org/2024.wmt-1.49
DOI:
Bibkey:
Cite (ACL):
Isai Gordeev, Sergey Kuldin, and David Dale. 2024. FLORES+ Translation and Machine Translation Evaluation for the Erzya Language. In Proceedings of the Ninth Conference on Machine Translation, pages 614–623, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
FLORES+ Translation and Machine Translation Evaluation for the Erzya Language (Gordeev et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.49.pdf