Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Kenton Murray, Jeffery Kinnison, Toan Q. Nguyen, Walter Scheirer, David Chiang


Abstract
Neural sequence-to-sequence models, particularly the Transformer, are the state of the art in machine translation. Yet these neural networks are very sensitive to architecture and hyperparameter settings. Optimizing these settings by grid or random search is computationally expensive because it requires many training runs. In this paper, we incorporate architecture search into a single training run through auto-sizing, which uses regularization to delete neurons in a network over the course of training. On very low-resource language pairs, we show that auto-sizing can improve BLEU scores by up to 3.9 points while removing one-third of the parameters from the model.
Anthology ID:
D19-5625
Volume:
Proceedings of the 3rd Workshop on Neural Generation and Translation
Month:
November
Year:
2019
Address:
Hong Kong
Editors:
Alexandra Birch, Andrew Finch, Hiroaki Hayashi, Ioannis Konstas, Thang Luong, Graham Neubig, Yusuke Oda, Katsuhito Sudoh
Venue:
NGT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
231–240
Language:
URL:
https://aclanthology.org/D19-5625/
DOI:
10.18653/v1/D19-5625
Bibkey:
Cite (ACL):
Kenton Murray, Jeffery Kinnison, Toan Q. Nguyen, Walter Scheirer, and David Chiang. 2019. Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 231–240, Hong Kong. Association for Computational Linguistics.
Cite (Informal):
Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation (Murray et al., NGT 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-5625.pdf
Code
 KentonMurray/ProxGradPytorch