On the Sparsity of Neural Machine Translation Models

Yong Wang, Longyue Wang, Victor Li, Zhaopeng Tu


Abstract
Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures. We show that: 1) the pruned parameters can be rejuvenated to improve the baseline model by up to +0.8 BLEU points; 2) the rejuvenated parameters are reallocated to enhance the ability of modeling low-level lexical information.
Anthology ID:
2020.emnlp-main.78
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1060–1066
Language:
URL:
https://aclanthology.org/2020.emnlp-main.78
DOI:
10.18653/v1/2020.emnlp-main.78
Bibkey:
Cite (ACL):
Yong Wang, Longyue Wang, Victor Li, and Zhaopeng Tu. 2020. On the Sparsity of Neural Machine Translation Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1060–1066, Online. Association for Computational Linguistics.
Cite (Informal):
On the Sparsity of Neural Machine Translation Models (Wang et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.78.pdf
Video:
 https://slideslive.com/38939018