Open-source LLMs vs. NMT Systems: Translating Spatial Language in EN-PT-br Subtitles

Rafael Fernandes, Marcos Lopes


Abstract
This research investigates the challenges of translating spatial language using open-source LLMs versus traditional NMTs. Focusing on spatial prepositions like ACROSS, INTO, ONTO, and THROUGH, which are particularly challenging for the EN-PT-br pair, the study evaluates translations using BLEU, METEOR, BERTScore, COMET, and TER metrics, along with manual error analysis. The findings reveal that moderate-sized LLMs, such as LLaMa-3-8B and Mixtral-8x7B, achieve accuracy comparable to NMTs like DeepL. However, LLMs frequently exhibit mistranslation errors, including interlanguage/code-switching and anglicisms, while NMTs demonstrate better fluency. Both LLMs and NMTs struggle with spatial-related errors, including syntactic projections and polysemy. The study concludes that significant hurdles remain in accurately translating spatial language, suggesting that future research should focus on enhancing training datasets, refining models, and developing more sophisticated evaluation metrics.
Anthology ID:
2024.amta-presentations.11
Volume:
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations)
Month:
September
Year:
2024
Address:
Chicago, USA
Editors:
Marianna Martindale, Janice Campbell, Konstantin Savenkov, Shivali Goel
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
152–153
Language:
URL:
https://aclanthology.org/2024.amta-presentations.11
DOI:
Bibkey:
Cite (ACL):
Rafael Fernandes and Marcos Lopes. 2024. Open-source LLMs vs. NMT Systems: Translating Spatial Language in EN-PT-br Subtitles. In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations), pages 152–153, Chicago, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Open-source LLMs vs. NMT Systems: Translating Spatial Language in EN-PT-br Subtitles (Fernandes & Lopes, AMTA 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.amta-presentations.11.pdf