Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task

Nikolay Bogoychev, Roman Grundkiewicz, Alham Fikri Aji, Maximiliana Behnke, Kenneth Heafield, Sidharth Kashyap, Emmanouil-Ioannis Farsarakis, Mateusz Chudyk


Abstract
We participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task: single-core CPU, multi-core CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multi-core setting. To reduce model size, we experimented with 4-bit log quantization but use floats at runtime. In the shared task, most of our submissions were Pareto optimal with respect the trade-off between time and quality.
Anthology ID:
2020.ngt-1.26
Volume:
Proceedings of the Fourth Workshop on Neural Generation and Translation
Month:
July
Year:
2020
Address:
Online
Venue:
NGT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
218–224
Language:
URL:
https://aclanthology.org/2020.ngt-1.26
DOI:
10.18653/v1/2020.ngt-1.26
Bibkey:
Cite (ACL):
Nikolay Bogoychev, Roman Grundkiewicz, Alham Fikri Aji, Maximiliana Behnke, Kenneth Heafield, Sidharth Kashyap, Emmanouil-Ioannis Farsarakis, and Mateusz Chudyk. 2020. Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task. In Proceedings of the Fourth Workshop on Neural Generation and Translation, pages 218–224, Online. Association for Computational Linguistics.
Cite (Informal):
Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task (Bogoychev et al., NGT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.ngt-1.26.pdf
Dataset:
 2020.ngt-1.26.Dataset.txt
Video:
 http://slideslive.com/38929840