VARCO-MT: NCSOFT’s WMT’23 Terminology Shared Task Submission

Geon Woo Park, Junghwa Lee, Meiying Ren, Allison Shindell, Yeonsoo Lee


Abstract
A lack of consistency in terminology translation undermines quality of translation from even the best performing neural machine translation (NMT) models, especially in narrow domains like literature, medicine, and video game jargon. Dictionaries containing terminologies and their translations are often used to improve consistency but are difficult to construct and incorporate. We accompany our submissions to the WMT ‘23 Terminology Shared Task with a description of our experimental setup and procedure where we propose a framework of terminology-aware machine translation. Our framework comprises of an automatic terminology extraction process that constructs terminology-aware machine translation data in low-supervision settings and two model architectures with terminology constraints. Our models outperform baseline models by 21.51%p and 19.36%p in terminology recall respectively on the Chinese to English WMT’23 Terminology Shared Task test data.
Anthology ID:
2023.wmt-1.84
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
919–925
Language:
URL:
https://aclanthology.org/2023.wmt-1.84
DOI:
10.18653/v1/2023.wmt-1.84
Bibkey:
Cite (ACL):
Geon Woo Park, Junghwa Lee, Meiying Ren, Allison Shindell, and Yeonsoo Lee. 2023. VARCO-MT: NCSOFT’s WMT’23 Terminology Shared Task Submission. In Proceedings of the Eighth Conference on Machine Translation, pages 919–925, Singapore. Association for Computational Linguistics.
Cite (Informal):
VARCO-MT: NCSOFT’s WMT’23 Terminology Shared Task Submission (Park et al., WMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.wmt-1.84.pdf