Efficient Inference for Multilingual Neural Machine Translation

Alexandre Berard, Dain Lee, Stephane Clinchant, Kweonwoo Jung, Vassilina Nikoulina


Abstract
Multilingual NMT has become an attractive solution for MT deployment in production. But to match bilingual quality, it comes at the cost of larger and slower models. In this work, we consider several ways to make multilingual NMT faster at inference without degrading its quality. We experiment with several “light decoder” architectures in two 20-language multi-parallel settings: small-scale on TED Talks and large-scale on ParaCrawl. Our experiments demonstrate that combining a shallow decoder with vocabulary filtering leads to almost 2 times faster inference with no loss in translation quality. We validate our findings with BLEU and chrF (on 380 language pairs), robustness evaluation and human evaluation.
Anthology ID:
2021.emnlp-main.674
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8563–8583
Language:
URL:
https://aclanthology.org/2021.emnlp-main.674
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.674.pdf