Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

Devendra Sachan; Graham Neubig

doi:10.18653/v1/W18-6327

Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

Abstract

In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained models. However, these improvements are not uniform; often multilingual parameter sharing results in a decrease in accuracy due to translation models not being able to accommodate different languages in their limited parameter space. In this work, we examine parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model. We find that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family. However, even in the case where target languages are from different families where full parameter sharing leads to a noticeable drop in BLEU scores, our proposed methods for partial sharing of parameters can lead to substantial improvements in translation accuracy.

Anthology ID:: W18-6327
Volume:: Proceedings of the Third Conference on Machine Translation: Research Papers
Month:: October
Year:: 2018
Address:: Brussels, Belgium
Editors:: Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 261–271
Language:
URL:: https://aclanthology.org/W18-6327/
DOI:: 10.18653/v1/W18-6327
Bibkey:
Cite (ACL):: Devendra Sachan and Graham Neubig. 2018. Parameter Sharing Methods for Multilingual Self-Attentional Translation Models. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 261–271, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Parameter Sharing Methods for Multilingual Self-Attentional Translation Models (Sachan & Neubig, WMT 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-6327.pdf

PDF Cite Search Fix data