Robustness of Fine-Tuned Models for Machine Translation with Varying Noise Levels: Insights for Asturian, Aragonese and Aranese

Martin Bär; Elisa Forcada Rodríguez; María García-Abadillo Velasco

doi:10.18653/v1/2024.wmt-1.89

Robustness of Fine-Tuned Models for Machine Translation with Varying Noise Levels: Insights for Asturian, Aragonese and Aranese

Martin Bär, Elisa Forcada Rodríguez, María García-Abadillo Velasco

Abstract

We present the LCT-LAP proposal for the shared task on Translation into Low-Resource Languages of Spain at WMT24 within the constrained submission category. Our work harnesses encoder-decoder models pretrained on higher-resource Iberian languages to facilitate MT model training for Asturian, Aranese and Aragonese. Furthermore, we explore the robustness of these models when fine-tuned on datasets with varying levels of alignment noise. We fine-tuned a Spanish-Galician model using Asturian data filtered by BLEU score thresholds of 5, 15, 30 and 60, identifying BLEU 15 as the most effective. This threshold was then applied to the Aranese and Aragonese datasets. Our findings indicate that filtering the corpora reduces computational costs and improves performance compared to using nearly raw data or data filtered with language identification. However, it still falls short of the performance achieved by the rule-based system Apertium in Aranese and Aragonese.

Anthology ID:: 2024.wmt-1.89
Volume:: Proceedings of the Ninth Conference on Machine Translation
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venues:: WMT | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 918–924
Language:
URL:: https://aclanthology.org/2024.wmt-1.89/
DOI:: 10.18653/v1/2024.wmt-1.89
Bibkey:
Cite (ACL):: Martin Bär, Elisa Forcada Rodríguez, and María García-Abadillo Velasco. 2024. Robustness of Fine-Tuned Models for Machine Translation with Varying Noise Levels: Insights for Asturian, Aragonese and Aranese. In Proceedings of the Ninth Conference on Machine Translation, pages 918–924, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Robustness of Fine-Tuned Models for Machine Translation with Varying Noise Levels: Insights for Asturian, Aragonese and Aranese (Bär et al., WMT 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.wmt-1.89.pdf

PDF Cite Search Fix data