Fine-tuning NMT Models and LLMs for Specialised EN-ES Translation Using Aligned Corpora, Glossaries, and Synthetic Data: MULTITAN at WMT25 Terminology Shared Task

Lichao Zhu; Maria Zimina-Poirot; Cristian Valdez; Stephane Patin

doi:10.18653/v1/2025.wmt-1.108

Fine-tuning NMT Models and LLMs for Specialised EN-ES Translation Using Aligned Corpora, Glossaries, and Synthetic Data: MULTITAN at WMT25 Terminology Shared Task

Lichao Zhu, Maria Zimina-Poirot, Cristian Valdez, Stephane Patin

Abstract

This paper describes our participation in the WMT25 Terminology Shared Task, specifically Track 1 (Spanish to English) focused on translation within the Information Technology (IT) domain. The shared task challenges participants to improve machine translation systems by effectively incorporating terminology constraints to ensure accurate and consistent translation of specialised technical terms. We experimented with several approaches to tackle terminology and lexical constraints with both NMT systems and LLMs with a small amount of training data and a glossary. Experimental results demonstrate that systems behave differently with and without glossary. The NMT system seems rather limited in adapting to special lexicon and resizing embeddings, which is the opposite of the case with LLMs preferring structured instructions. Through this participation, our objective is to improve terminology accuracy and overall translation quality, highlight the potential of specialised terminology-aware translation models for technical domains, and explore possibilities of fine-tuning of LLMs and NMT models with domain and lexical constraints.

Anthology ID:: 2025.wmt-1.108
Volume:: Proceedings of the Tenth Conference on Machine Translation
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:: WMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1284–1291
Language:
URL:: https://aclanthology.org/2025.wmt-1.108/
DOI:: 10.18653/v1/2025.wmt-1.108
Bibkey:
Cite (ACL):: Lichao Zhu, Maria Zimina-Poirot, Cristian Valdez, and Stephane Patin. 2025. Fine-tuning NMT Models and LLMs for Specialised EN-ES Translation Using Aligned Corpora, Glossaries, and Synthetic Data: MULTITAN at WMT25 Terminology Shared Task. In Proceedings of the Tenth Conference on Machine Translation, pages 1284–1291, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Fine-tuning NMT Models and LLMs for Specialised EN-ES Translation Using Aligned Corpora, Glossaries, and Synthetic Data: MULTITAN at WMT25 Terminology Shared Task (Zhu et al., WMT 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.wmt-1.108.pdf

PDF Cite Search Fix data