Adding soft terminology constraints to pre-trained generic MT models by means of continued training

Tommi Nieminen


Abstract
This article describes an efficient method of adding terminology support to existing machine translation models. The training of the pre-trained models is continued with parallel data where strings identified as terms in the source language data have been annotated with the lemmas of the corresponding target terms. Evaluation using standard test sets and methods confirms that continued training from generic base models can produce term models that are competitive with models specifically trained as term models.
Anthology ID:
2024.kemt-1.3
Volume:
Proceedings of the First International Workshop on Knowledge-Enhanced Machine Translation
Month:
June
Year:
2024
Address:
Sheffield, United Kingdom
Editors:
Arda Tezcan, Víctor M. Sánchez-Cartagena, Miquel Esplà-Gomis
Venues:
KEMT | WS
SIG:
Publisher:
European Association for Machine Translation (EAMT)
Note:
Pages:
21–33
Language:
URL:
https://aclanthology.org/2024.kemt-1.3
DOI:
Bibkey:
Cite (ACL):
Tommi Nieminen. 2024. Adding soft terminology constraints to pre-trained generic MT models by means of continued training. In Proceedings of the First International Workshop on Knowledge-Enhanced Machine Translation, pages 21–33, Sheffield, United Kingdom. European Association for Machine Translation (EAMT).
Cite (Informal):
Adding soft terminology constraints to pre-trained generic MT models by means of continued training (Nieminen, KEMT-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.kemt-1.3.pdf