Expanding FLORES+ Benchmark for More Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation

Felermino Dario Mario Ali, Henrique Lopes Cardoso, Rui Sousa-Silva


Abstract
As part of the Open Language Data Initiative shared tasks, we have expanded the FLORES+ evaluation set to include Emakhuwa, a low-resource language widely spoken in Mozambique. We translated the dev and devtest sets from Portuguese into Emakhuwa, and we detail the translation process and quality assurance measures used. Our methodology involved various quality checks, including post-editing and adequacy assessments. The resulting datasets consist of multiple reference sentences for each source. We present baseline results from training a Neural Machine Translation system and fine-tuning existing multilingual translation models. Our findings suggest that spelling inconsistencies remain a challenge in Emakhuwa. Additionally, the baseline models underperformed on this evaluation set, underscoring the necessity for further research to enhance machine translation quality for Emakhuwa.The data is publicly available at https://huggingface.co/datasets/LIACC/Emakhuwa-FLORES
Anthology ID:
2024.wmt-1.45
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
579–592
Language:
URL:
https://aclanthology.org/2024.wmt-1.45
DOI:
Bibkey:
Cite (ACL):
Felermino Dario Mario Ali, Henrique Lopes Cardoso, and Rui Sousa-Silva. 2024. Expanding FLORES+ Benchmark for More Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation. In Proceedings of the Ninth Conference on Machine Translation, pages 579–592, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Expanding FLORES+ Benchmark for More Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation (Ali et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.45.pdf