Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation

Jesujoba O. Alabi, Rachel Bawden


Abstract
Domain transfer remains a challenge in machine translation (MT), particularly concerning rare or unseen words. Amongst the strategies proposed to address the issue, one of the simplest and most promising in terms of generalisation capacity is coupling the MT system with external resources such as bilingual lexicons and appending inline annotations within source sentences. This method has been shown to work well for controlled language settings, but its usability for general language (and ambiguous) MT is less certain. In this article we explore this question further, testing the strategy in a multi-domain transfer setting for German-to-English MT, using the mT5 language model fine-tuned on parallel data. We analyse the MT outputs and design evaluation strategies to understand the behaviour of such models. Our analysis using distractor annotations suggests that although improvements are not systematic according to automatic metrics, the model does learn to select appropriate translation candidates and ignore irrelevant ones, thereby exhibiting more than a systematic copying behaviour. However, we also find that the method is less successful in a higher-resource setting with a larger lexicon, suggesting that it is not a magic solution, especially when the baseline model is already exposed to a wide range of vocabulary.
Anthology ID:
2024.kemt-1.2
Volume:
Proceedings of the First International Workshop on Knowledge-Enhanced Machine Translation
Month:
June
Year:
2024
Address:
Sheffield, United Kingdom
Editors:
Arda Tezcan, Víctor M. Sánchez-Cartagena, Miquel Esplà-Gomis
Venues:
KEMT | WS
SIG:
Publisher:
European Association for Machine Translation (EAMT)
Note:
Pages:
7–20
Language:
URL:
https://aclanthology.org/2024.kemt-1.2
DOI:
Bibkey:
Cite (ACL):
Jesujoba O. Alabi and Rachel Bawden. 2024. Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation. In Proceedings of the First International Workshop on Knowledge-Enhanced Machine Translation, pages 7–20, Sheffield, United Kingdom. European Association for Machine Translation (EAMT).
Cite (Informal):
Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation (Alabi & Bawden, KEMT-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.kemt-1.2.pdf