Exploring Transfer Learning and Domain Data Selection for the Biomedical Translation

Noor-e- Hira, Sadaf Abdul Rauf, Kiran Kiani, Ammara Zafar, Raheel Nawaz


Abstract
Transfer Learning and Selective data training are two of the many approaches being extensively investigated to improve the quality of Neural Machine Translation systems. This paper presents a series of experiments by applying transfer learning and selective data training for participation in the Bio-medical shared task of WMT19. We have used Information Retrieval to selectively choose related sentences from out-of-domain data and used them as additional training data using transfer learning. We also report the effect of tokenization on translation model performance.
Anthology ID:
W19-5419
Volume:
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
156–163
Language:
URL:
https://aclanthology.org/W19-5419
DOI:
10.18653/v1/W19-5419
Bibkey:
Cite (ACL):
Noor-e- Hira, Sadaf Abdul Rauf, Kiran Kiani, Ammara Zafar, and Raheel Nawaz. 2019. Exploring Transfer Learning and Domain Data Selection for the Biomedical Translation. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 156–163, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Exploring Transfer Learning and Domain Data Selection for the Biomedical Translation (Hira et al., WMT 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5419.pdf