Making Asynchronous Stochastic Gradient Descent Work for Transformers Alham Fikri Aji author Kenneth Heafield author 2019-11 text Proceedings of the 3rd Workshop on Neural Generation and Translation Alexandra Birch editor Andrew Finch editor Hiroaki Hayashi editor Ioannis Konstas editor Thang Luong editor Graham Neubig editor Yusuke Oda editor Katsuhito Sudoh editor Association for Computational Linguistics Hong Kong conference publication aji-heafield-2019-making 10.18653/v1/D19-5608 https://aclanthology.org/D19-5608/ 2019-11 80 89