Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain

David Ponce, Harritxu Gete, Thierry Etchegoyhen


Abstract
We describe Vicomtech’s participation in the WMT 2024 Shared Task on translation into low-resource languages of Spain. We addressed all three languages of the task, namely Aragonese, Aranese and Asturian, in both constrained and open settings. Our work mainly centred on exploiting different types of corpora via data filtering, selection and combination methods, along with synthetic data generated with translation models based on rules, neural sequence-to-sequence or large language models. We improved or matched the best baselines in all three language pairs and present complementary results on additional test sets.
Anthology ID:
2024.wmt-1.91
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
934–942
Language:
URL:
https://aclanthology.org/2024.wmt-1.91
DOI:
Bibkey:
Cite (ACL):
David Ponce, Harritxu Gete, and Thierry Etchegoyhen. 2024. Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain. In Proceedings of the Ninth Conference on Machine Translation, pages 934–942, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain (Ponce et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.91.pdf