FJWU participation for the WMT20 Biomedical Translation Task

Sumbal Naz, Sadaf Abdul Rauf, Noor-e- Hira, Sami Ul Haq


Abstract
This paper reports system descriptions for FJWU-NRPU team for participation in the WMT20 Biomedical shared translation task. We focused our submission on exploring the effects of adding in-domain corpora extracted from various out-of-domain sources. Systems were built for French to English using in-domain corpora through fine tuning and selective data training. We further explored BERT based models specifically with focus on effect of domain adaptive subword units.
Anthology ID:
2020.wmt-1.92
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Editors:
Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
849–856
Language:
URL:
https://aclanthology.org/2020.wmt-1.92
DOI:
Bibkey:
Cite (ACL):
Sumbal Naz, Sadaf Abdul Rauf, Noor-e- Hira, and Sami Ul Haq. 2020. FJWU participation for the WMT20 Biomedical Translation Task. In Proceedings of the Fifth Conference on Machine Translation, pages 849–856, Online. Association for Computational Linguistics.
Cite (Informal):
FJWU participation for the WMT20 Biomedical Translation Task (Naz et al., WMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wmt-1.92.pdf
Video:
 https://slideslive.com/38939656