Lingua Custodia’s Participation at the WMT 2023 Terminology Shared Task

Jingshu Liu, Mariam Nakhlé, Gaëtan Caillout, Raheel Qadar


Abstract
This paper presents Lingua Custodia’s submission to the WMT23 shared task on Terminology shared task. Ensuring precise translation of technical terms plays a pivotal role in gauging the final quality of machine translation results. Our goal is to follow the terminology constraint while applying the machine translation system. Inspired by the recent work of terminology control, we propose to annotate the machine learning training data by leveraging a synthetic dictionary extracted in a fully non supervised way from the give parallel corpora. The model learned with this training data can then be then used to translate text with a given terminology in a flexible manner. In addition, we introduce a careful annotated data re-sampling step in order to guide the model to see different terminology types enough times. In this task we consider all the three language directions: Chinese to English, English to Czech and German to English. Our automatic evaluation metrics with the submitted systems show the effectiveness of the proposed method.
Anthology ID:
2023.wmt-1.81
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
897–901
Language:
URL:
https://aclanthology.org/2023.wmt-1.81
DOI:
10.18653/v1/2023.wmt-1.81
Bibkey:
Cite (ACL):
Jingshu Liu, Mariam Nakhlé, Gaëtan Caillout, and Raheel Qadar. 2023. Lingua Custodia’s Participation at the WMT 2023 Terminology Shared Task. In Proceedings of the Eighth Conference on Machine Translation, pages 897–901, Singapore. Association for Computational Linguistics.
Cite (Informal):
Lingua Custodia’s Participation at the WMT 2023 Terminology Shared Task (Liu et al., WMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.wmt-1.81.pdf