Domain Informed Neural Machine Translation: Developing Translation Services for Healthcare Enterprise

Sahil Manchanda, Galina Grunin


Abstract
Neural Machine Translation (NMT) is a deep learning based approach that has achieved outstanding results lately in the translation community. The performance of NMT systems, however, is dependent on the availability of large amounts of in-domain parallel corpora. The business enterprises in domains such as legal and healthcare require specialized vocabulary but translation systems trained for a general purpose do not cater to these needs. The data in these domains is either hard to acquire or is very small in comparison to public data sets. This is a detailed report of using an open-source library to implement a machine translation system and successfully customizing it for the needs of a particular client in the healthcare domain. This report details the chronological development of every component of this system, namely, extraction of data from in-domain healthcare documents, a pre-processing pipeline for the data, data alignment and augmentation, training and a fully automated and robust deployment pipeline. This work proposes an efficient way for the continuous deployment of newly trained deep learning models. The deployed translation models are optimized for both inference time and cost.
Anthology ID:
2020.eamt-1.27
Volume:
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
Month:
November
Year:
2020
Address:
Lisboa, Portugal
Editors:
André Martins, Helena Moniz, Sara Fumega, Bruno Martins, Fernando Batista, Luisa Coheur, Carla Parra, Isabel Trancoso, Marco Turchi, Arianna Bisazza, Joss Moorkens, Ana Guerberof, Mary Nurminen, Lena Marg, Mikel L. Forcada
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
255–261
Language:
URL:
https://aclanthology.org/2020.eamt-1.27
DOI:
Bibkey:
Cite (ACL):
Sahil Manchanda and Galina Grunin. 2020. Domain Informed Neural Machine Translation: Developing Translation Services for Healthcare Enterprise. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 255–261, Lisboa, Portugal. European Association for Machine Translation.
Cite (Informal):
Domain Informed Neural Machine Translation: Developing Translation Services for Healthcare Enterprise (Manchanda & Grunin, EAMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.eamt-1.27.pdf