Towards Precise Lexicon Integration in Neural Machine Translation

Ogün Öz, Maria Sukhareva


Abstract
Terminological consistency is an essential requirement for industrial translation. High-quality, hand-crafted terminologies contain entries in their nominal forms. Integrating such a terminology into machine translation is not a trivial task. The MT system must be able to disambiguate homographs on the source side and choose the correct wordform on the target side. In this work, we propose a simple but effective method for homograph disambiguation and a method of wordform selection by introducing multi-choice lexical constraints. We also propose a metric to measure the terminological consistency of the translation. Our results have a significant improvement over the current SOTA in terms of terminological consistency without any loss of the BLEU score. All the code used in this work will be published as open-source.
Anthology ID:
2021.ranlp-1.122
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1084–1095
Language:
URL:
https://aclanthology.org/2021.ranlp-1.122
DOI:
Bibkey:
Cite (ACL):
Ogün Öz and Maria Sukhareva. 2021. Towards Precise Lexicon Integration in Neural Machine Translation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1084–1095, Held Online. INCOMA Ltd..
Cite (Informal):
Towards Precise Lexicon Integration in Neural Machine Translation (Öz & Sukhareva, RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.122.pdf
Code
 term-integration-mt/term-integration-mt