Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU

Jacob Devlin


Abstract
Attentional sequence-to-sequence models have become the new standard for machine translation, but one challenge of such models is a significant increase in training and decoding cost compared to phrase-based systems. In this work we focus on efficient decoding, with a goal of achieving accuracy close the state-of-the-art in neural machine translation (NMT), while achieving CPU decoding speed/throughput close to that of a phrasal decoder. We approach this problem from two angles: First, we describe several techniques for speeding up an NMT beam search decoder, which obtain a 4.4x speedup over a very efficient baseline decoder without changing the decoder output. Second, we propose a simple but powerful network architecture which uses an RNN (GRU/LSTM) layer at bottom, followed by a series of stacked fully-connected layers applied at every timestep. This architecture achieves similar accuracy to a deep recurrent model, at a small fraction of the training and decoding cost. By combining these techniques, our best system achieves a very competitive accuracy of 38.3 BLEU on WMT English-French NewsTest2014, while decoding at 100 words/sec on single-threaded CPU. We believe this is the best published accuracy/speed trade-off of an NMT system.
Anthology ID:
D17-1300
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2820–2825
Language:
URL:
https://aclanthology.org/D17-1300
DOI:
10.18653/v1/D17-1300
Bibkey:
Cite (ACL):
Jacob Devlin. 2017. Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2820–2825, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU (Devlin, EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1300.pdf
Attachment:
 D17-1300.Attachment.zip
Video:
 https://aclanthology.org/D17-1300.mp4