Allison Shindell
2023
VARCO-MT: NCSOFT’s WMT’23 Terminology Shared Task Submission
Geon Woo Park
|
Junghwa Lee
|
Meiying Ren
|
Allison Shindell
|
Yeonsoo Lee
Proceedings of the Eighth Conference on Machine Translation
A lack of consistency in terminology translation undermines quality of translation from even the best performing neural machine translation (NMT) models, especially in narrow domains like literature, medicine, and video game jargon. Dictionaries containing terminologies and their translations are often used to improve consistency but are difficult to construct and incorporate. We accompany our submissions to the WMT ‘23 Terminology Shared Task with a description of our experimental setup and procedure where we propose a framework of terminology-aware machine translation. Our framework comprises of an automatic terminology extraction process that constructs terminology-aware machine translation data in low-supervision settings and two model architectures with terminology constraints. Our models outperform baseline models by 21.51%p and 19.36%p in terminology recall respectively on the Chinese to English WMT’23 Terminology Shared Task test data.