Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting

Nikolay Bogoychev, Pinzhen Chen


Abstract
Terminology correctness is important in the downstream application of machine translation, and a prevalent way to ensure this is to inject terminology constraints into a translation system. In our submission to the WMT 2023 terminology translation task, we adopt a translate-then-refine approach which can be domain-independent and requires minimal manual efforts. We annotate random source words with pseudo-terminology translations obtained from word alignment to first train a terminology-aware model. Further, we explore two post-processing methods. First, we use an alignment process to discover whether a terminology constraint has been violated, and if so, we re-decode with the violating word negatively constrained. Alternatively, we leverage a large language model to refine a hypothesis by providing it with terminology constraints. Results show that our terminology-aware model learns to incorporate terminologies effectively, and the large language model refinement process can further improve terminology recall.
Anthology ID:
2023.wmt-1.80
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
890–896
Language:
URL:
https://aclanthology.org/2023.wmt-1.80
DOI:
10.18653/v1/2023.wmt-1.80
Bibkey:
Cite (ACL):
Nikolay Bogoychev and Pinzhen Chen. 2023. Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting. In Proceedings of the Eighth Conference on Machine Translation, pages 890–896, Singapore. Association for Computational Linguistics.
Cite (Informal):
Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting (Bogoychev & Chen, WMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.wmt-1.80.pdf