Nearest Neighbor Knowledge Distillation for Neural Machine Translation

Zhixian Yang; Renliang Sun; Xiaojun Wan

doi:10.18653/v1/2022.naacl-main.406

Nearest Neighbor Knowledge Distillation for Neural Machine Translation

Abstract

k-nearest-neighbor machine translation (kNN-MT), proposed by Khandelwal et al. (2021), has achieved many state-of-the-art results in machine translation tasks. Although effective, kNN-MT requires conducting kNN searches through the large datastore for each decoding step during inference, prohibitively increasing the decoding cost and thus leading to the difficulty for the deployment in real-world applications. In this paper, we propose to move the time-consuming kNN search forward to the preprocessing phase, and then introduce k Nearest Neighbor Knowledge Distillation (kNN-KD) that trains the base NMT model to directly learn the knowledge of kNN. Distilling knowledge retrieved by kNN can encourage the NMT model to take more reasonable target tokens into consideration, thus addressing the overcorrection problem. Extensive experimental results show that, the proposed method achieves consistent improvement over the state-of-the-art baselines including kNN-MT, while maintaining the same training and decoding speed as the standard NMT model.

Anthology ID:: 2022.naacl-main.406
Volume:: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5546–5556
Language:
URL:: https://aclanthology.org/2022.naacl-main.406
DOI:: 10.18653/v1/2022.naacl-main.406
Bibkey:
Cite (ACL):: Zhixian Yang, Renliang Sun, and Xiaojun Wan. 2022. Nearest Neighbor Knowledge Distillation for Neural Machine Translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5546–5556, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: Nearest Neighbor Knowledge Distillation for Neural Machine Translation (Yang et al., NAACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.naacl-main.406.pdf
Video:: https://aclanthology.org/2022.naacl-main.406.mp4
Code: fadedcosine/knn-kd

PDF Cite Search Code Video