Marcos Lopes
2024
Open-source LLMs vs. NMT Systems: Translating Spatial Language in EN-PT-br Subtitles
Rafael Fernandes
|
Marcos Lopes
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations)
This research investigates the challenges of translating spatial language using open-source LLMs versus traditional NMTs. Focusing on spatial prepositions like ACROSS, INTO, ONTO, and THROUGH, which are particularly challenging for the EN-PT-br pair, the study evaluates translations using BLEU, METEOR, BERTScore, COMET, and TER metrics, along with manual error analysis. The findings reveal that moderate-sized LLMs, such as LLaMa-3-8B and Mixtral-8x7B, achieve accuracy comparable to NMTs like DeepL. However, LLMs frequently exhibit mistranslation errors, including interlanguage/code-switching and anglicisms, while NMTs demonstrate better fluency. Both LLMs and NMTs struggle with spatial-related errors, including syntactic projections and polysemy. The study concludes that significant hurdles remain in accurately translating spatial language, suggesting that future research should focus on enhancing training datasets, refining models, and developing more sophisticated evaluation metrics.
Spatial Information Challenges in English to Portuguese Machine Translation
Rafael Fernandes
|
Rodrigo Souza
|
Marcos Lopes
|
Paulo Santos
|
Thomas Finbow
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1