Dynamic Terminology Integration for COVID-19 and Other Emerging Domains

Toms Bergmanis, Mārcis Pinnis


Abstract
The majority of language domains require prudent use of terminology to ensure clarity and adequacy of information conveyed. While the correct use of terminology for some languages and domains can be achieved by adapting general-purpose MT systems on large volumes of in-domain parallel data, such quantities of domain-specific data are seldom available for less-resourced languages and niche domains. Furthermore, as exemplified by COVID-19 recently, no domain-specific parallel data is readily available for emerging domains. However, the gravity of this recent calamity created a high demand for reliable translation of critical information regarding pandemic and infection prevention. This work is part of WMT2021 Shared Task: Machine Translation using Terminologies, where we describe Tilde MT systems that are capable of dynamic terminology integration at the time of translation. Our systems achieve up to 94% COVID-19 term use accuracy on the test set of the EN-FR language pair without having access to any form of in-domain information during system training.
Anthology ID:
2021.wmt-1.81
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Venues:
EMNLP | WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
821–827
Language:
URL:
https://aclanthology.org/2021.wmt-1.81
DOI:
Bibkey:
Cite (ACL):
Toms Bergmanis and Mārcis Pinnis. 2021. Dynamic Terminology Integration for COVID-19 and Other Emerging Domains. In Proceedings of the Sixth Conference on Machine Translation, pages 821–827, Online. Association for Computational Linguistics.
Cite (Informal):
Dynamic Terminology Integration for COVID-19 and Other Emerging Domains (Bergmanis & Pinnis, WMT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wmt-1.81.pdf