Sampling and Filtering of Neural Machine Translation Distillation Data

Vilém Zouhar


Abstract
In most of neural machine translation distillation or stealing scenarios, the highest-scoring hypothesis of the target model (teacher) is used to train a new model (student). If reference translations are also available, then better hypotheses (with respect to the references) can be oversampled and poor hypotheses either removed or undersampled. This paper explores the sampling method landscape (pruning, hypothesis oversampling and undersampling, deduplication and their combination) with English to Czech and English to German MT models using standard MT evaluation metrics. We show that careful oversampling and combination with the original data leads to better performance when compared to training only on the original or synthesized data or their direct combination.
Anthology ID:
2021.naacl-srw.1
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–8
Language:
URL:
https://aclanthology.org/2021.naacl-srw.1
DOI:
10.18653/v1/2021.naacl-srw.1
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-srw.1.pdf
Code
 zouharvi/reference-mt-distill