%0 Conference Proceedings
%T Edinburgh’s Submission to the WMT 2022 Efficiency Task
%A Bogoychev, Nikolay
%A Behnke, Maximiliana
%A Van Der Linde, Jelmer
%A Nail, Graeme
%A Heafield, Kenneth
%A Zhang, Biao
%A Kashyap, Sidharth
%Y Koehn, Philipp
%Y Barrault, Loïc
%Y Bojar, Ondřej
%Y Bougares, Fethi
%Y Chatterjee, Rajen
%Y Costa-jussà, Marta R.
%Y Federmann, Christian
%Y Fishel, Mark
%Y Fraser, Alexander
%Y Freitag, Markus
%Y Graham, Yvette
%Y Grundkiewicz, Roman
%Y Guzman, Paco
%Y Haddow, Barry
%Y Huck, Matthias
%Y Jimeno Yepes, Antonio
%Y Kocmi, Tom
%Y Martins, André
%Y Morishita, Makoto
%Y Monz, Christof
%Y Nagata, Masaaki
%Y Nakazawa, Toshiaki
%Y Negri, Matteo
%Y Névéol, Aurélie
%Y Neves, Mariana
%Y Popel, Martin
%Y Turchi, Marco
%Y Zampieri, Marcos
%S Proceedings of the Seventh Conference on Machine Translation (WMT)
%D 2022
%8 December
%I Association for Computational Linguistics
%C Abu Dhabi, United Arab Emirates (Hybrid)
%F bogoychev-etal-2022-edinburghs
%X We participated in all tracks of the WMT 2022 efficient machine translation task: single-core CPU, multi-core CPU, and GPU hardware with throughput and latency conditions. Our submissions explores a number of several efficiency strategies: knowledge distillation, a simpler simple recurrent unit (SSRU) decoder with one or two layers, shortlisting, deep encoder, shallow decoder, pruning and bidirectional decoder. For the CPU track, we used quantized 8-bit models. For the GPU track, we used FP16 quantisation. We explored various pruning strategies and combination of one or more of the above methods.
%U https://aclanthology.org/2022.wmt-1.63
%P 661-667