Wandri Jooste

2022

Knowledge Distillation for Sustainable Neural Machine Translation
Wandri Jooste | Andy Way | Rejwanul Haque | Riccardo Superbo
Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)

Knowledge distillation (KD) can be used to reduce model size and training time, without significant loss in performance. However, the process of distilling knowledge requires translation of sizeable data sets, and the translation is usually performed using large cumbersome models (teacher models). Producing such translations for KD is expensive in terms of both time and cost, which is a significant concern for translation service providers. On top of that, this process can be the cause of higher carbon footprints. In this work, we tested different variants of a teacher model for KD, tracked the power consumption of the GPUs used during translation, recorded overall translation time, estimated translation cost, and measured the accuracy of the student models. The findings of our investigation demonstrate to the translation industry a cost-effective, high-quality alternative to the standard KD training methods.

2020

pdf bib abs

The ADAPT Centre’s Neural MT Systems for the WAT 2020 Document-Level Translation Task
Wandri Jooste | Rejwanul Haque | Andy Way
Proceedings of the 7th Workshop on Asian Translation

In this paper we describe the ADAPT Centre’s submissions to the WAT 2020 document-level Business Scene Dialogue (BSD) Translation task. We only consider translating from Japanese to English for this task and we use the MarianNMT toolkit to train Transformer models. In order to improve the translation quality, we made use of both in-domain and out-of-domain data for training our Machine Translation (MT) systems, as well as various data augmentation techniques for fine-tuning the model parameters. This paper outlines the experiments we ran to train our systems and report the accuracy achieved through these various experiments.

Co-authors

Venues

AMTA1
WAT1

Fix author