Knowledge Distillation for Sustainable Neural Machine Translation

Wandri Jooste; Andy Way; Rejwanul Haque; Riccardo Superbo

Knowledge Distillation for Sustainable Neural Machine Translation

Wandri Jooste, Andy Way, Rejwanul Haque, Riccardo Superbo

Abstract

Knowledge distillation (KD) can be used to reduce model size and training time, without significant loss in performance. However, the process of distilling knowledge requires translation of sizeable data sets, and the translation is usually performed using large cumbersome models (teacher models). Producing such translations for KD is expensive in terms of both time and cost, which is a significant concern for translation service providers. On top of that, this process can be the cause of higher carbon footprints. In this work, we tested different variants of a teacher model for KD, tracked the power consumption of the GPUs used during translation, recorded overall translation time, estimated translation cost, and measured the accuracy of the student models. The findings of our investigation demonstrate to the translation industry a cost-effective, high-quality alternative to the standard KD training methods.

Anthology ID:: 2022.amta-upg.16
Volume:: Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)
Month:: September
Year:: 2022
Address:: Orlando, USA
Editors:: Janice Campbell, Stephen Larocca, Jay Marciano, Konstantin Savenkov, Alex Yanishevsky
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:: 221–230
Language:
URL:: https://aclanthology.org/2022.amta-upg.16/
DOI:
Bibkey:
Cite (ACL):: Wandri Jooste, Andy Way, Rejwanul Haque, and Riccardo Superbo. 2022. Knowledge Distillation for Sustainable Neural Machine Translation. In Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track), pages 221–230, Orlando, USA. Association for Machine Translation in the Americas.
Cite (Informal):: Knowledge Distillation for Sustainable Neural Machine Translation (Jooste et al., AMTA 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.amta-upg.16.pdf
Presentation:: 2022.amta-upg.16.Presentation.pdf

PDF Cite Search Presentation Fix data