Combining Sequence Distillation and Transfer Learning for Efficient Low-Resource Neural Machine Translation Models

Raj Dabre, Atsushi Fujita


Abstract
In neural machine translation (NMT), sequence distillation (SD) through creation of distilled corpora leads to efficient (compact and fast) models. However, its effectiveness in extremely low-resource (ELR) settings has not been well-studied. On the other hand, transfer learning (TL) by leveraging larger helping corpora greatly improves translation quality in general. This paper investigates a combination of SD and TL for training efficient NMT models for ELR settings, where we utilize TL with helping corpora twice: once for distilling the ELR corpora and then during compact model training. We experimented with two ELR settings: Vietnamese–English and Hindi–English from the Asian Language Treebank dataset with 18k training sentence pairs. Using the compact models with 40% smaller parameters trained on the distilled ELR corpora, greedy search achieved 3.6 BLEU points improvement in average while reducing 40% of decoding time. We also confirmed that using both the distilled ELR and helping corpora in the second round of TL further improves translation quality. Our work highlights the importance of stage-wise application of SD and TL for efficient NMT modeling for ELR settings.
Anthology ID:
2020.wmt-1.61
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Editors:
Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
492–502
Language:
URL:
https://aclanthology.org/2020.wmt-1.61
DOI:
Bibkey:
Cite (ACL):
Raj Dabre and Atsushi Fujita. 2020. Combining Sequence Distillation and Transfer Learning for Efficient Low-Resource Neural Machine Translation Models. In Proceedings of the Fifth Conference on Machine Translation, pages 492–502, Online. Association for Computational Linguistics.
Cite (Informal):
Combining Sequence Distillation and Transfer Learning for Efficient Low-Resource Neural Machine Translation Models (Dabre & Fujita, WMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wmt-1.61.pdf
Video:
 https://slideslive.com/38939551