Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task

Casimiro Pio Carrino, Bardia Rafieian, Marta R. Costa-jussà, José A. R. Fonollosa


Abstract
In this work, we give a description of the TALP-UPC systems submitted for the WMT19 Biomedical Translation Task. Our proposed strategy is NMT model-independent and relies only on one ingredient, a biomedical terminology list. We first extracted such a terminology list by labelling biomedical words in our training dataset using the BabelNet API. Then, we designed a data preparation strategy to insert the terms information at a token level. Finally, we trained the Transformer model with this terms-informed data. Our best-submitted system ranked 2nd and 3rd for Spanish-English and English-Spanish translation directions, respectively.
Anthology ID:
W19-5418
Volume:
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
151–155
Language:
URL:
https://aclanthology.org/W19-5418/
DOI:
10.18653/v1/W19-5418
Bibkey:
Cite (ACL):
Casimiro Pio Carrino, Bardia Rafieian, Marta R. Costa-jussà, and José A. R. Fonollosa. 2019. Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 151–155, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task (Carrino et al., WMT 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5418.pdf