Gaëtan Caillout


pdf bib
Lingua Custodia’s Participation at the WMT 2023 Terminology Shared Task
Jingshu Liu | Mariam Nakhlé | Gaëtan Caillout | Raheel Qadar
Proceedings of the Eighth Conference on Machine Translation

This paper presents Lingua Custodia’s submission to the WMT23 shared task on Terminology shared task. Ensuring precise translation of technical terms plays a pivotal role in gauging the final quality of machine translation results. Our goal is to follow the terminology constraint while applying the machine translation system. Inspired by the recent work of terminology control, we propose to annotate the machine learning training data by leveraging a synthetic dictionary extracted in a fully non supervised way from the give parallel corpora. The model learned with this training data can then be then used to translate text with a given terminology in a flexible manner. In addition, we introduce a careful annotated data re-sampling step in order to guide the model to see different terminology types enough times. In this task we consider all the three language directions: Chinese to English, English to Czech and German to English. Our automatic evaluation metrics with the submitted systems show the effectiveness of the proposed method.