Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation

Inigo Jauregi Unanue, Massimo Piccardi


Abstract
This paper describes the machine translation systems proposed by the University of Technology Sydney Natural Language Processing (UTS_NLP) team for the WMT20 English-Basque biomedical translation tasks. Due to the limited parallel corpora available, we have proposed to train a BERT-fused NMT model that leverages the use of pretrained language models. Furthermore, we have augmented the training corpus by backtranslating monolingual data. Our experiments show that NMT models in low-resource scenarios can benefit from combining these two training techniques, with improvements of up to 6.16 BLEU percentual points in the case of biomedical abstract translations.
Anthology ID:
2020.wmt-1.89
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Editors:
Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
826–832
Language:
URL:
https://aclanthology.org/2020.wmt-1.89
DOI:
Bibkey:
Cite (ACL):
Inigo Jauregi Unanue and Massimo Piccardi. 2020. Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation. In Proceedings of the Fifth Conference on Machine Translation, pages 826–832, Online. Association for Computational Linguistics.
Cite (Informal):
Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation (Jauregi Unanue & Piccardi, WMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wmt-1.89.pdf
Video:
 https://slideslive.com/38939562