One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation

Chenze Shao, Xuanfu Wu, Yang Feng


Abstract
Non-autoregressive neural machine translation (NAT) suffers from the multi-modality problem: the source sentence may have multiple correct translations, but the loss function is calculated only according to the reference sentence. Sequence-level knowledge distillation makes the target more deterministic by replacing the target with the output from an autoregressive model. However, the multi-modality problem in the distilled dataset is still nonnegligible. Furthermore, learning from a specific teacher limits the upper bound of the model capability, restricting the potential of NAT models. In this paper, we argue that one reference is not enough and propose diverse distillation with reference selection (DDRS) for NAT. Specifically, we first propose a method called SeedDiv for diverse machine translation, which enables us to generate a dataset containing multiple high-quality reference translations for each source sentence. During the training, we compare the NAT output with all references and select the one that best fits the NAT output to train the model. Experiments on widely-used machine translation benchmarks demonstrate the effectiveness of DDRS, which achieves 29.82 BLEU with only one decoding pass on WMT14 En-De, improving the state-of-the-art performance for NAT by over 1 BLEU.
Anthology ID:
2022.naacl-main.277
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3779–3791
Language:
URL:
https://aclanthology.org/2022.naacl-main.277
DOI:
10.18653/v1/2022.naacl-main.277
Bibkey:
Cite (ACL):
Chenze Shao, Xuanfu Wu, and Yang Feng. 2022. One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3779–3791, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation (Shao et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.277.pdf
Code
 ictnlp/ddrs-nat