Accurate Knowledge Distillation via n-best Reranking

Hendra Setiawan

Accurate Knowledge Distillation via n-best Reranking

Abstract

We propose utilizing n-best reranking to enhance Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we extract pseudo-labels for student model’s training data from top n-best hypotheses and leverage a diverse set of models with different inductive biases, objective functions or architectures, including some publicly-available large language models, to pick the highest-quality hypotheses as labels. The effectiveness of our proposal is validated through experiments on the WMT’21 German ↔ English and Chinese ↔ English translation tasks. Our results demonstrate that utilizing pseudo-labels generated by our n-best reranker leads to a significantly more accurate student model. In fact, our best student model achieves comparable accuracy to a large translation model from (Tran et al., 2021) with 4.7 billion parameters, while having two orders of magnitude fewer parameters.

Anthology ID:: 2024.naacl-long.72
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1330–1345
Language:
URL:: https://aclanthology.org/2024.naacl-long.72
DOI:
Bibkey:
Cite (ACL):: Hendra Setiawan. 2024. Accurate Knowledge Distillation via n-best Reranking. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1330–1345, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Accurate Knowledge Distillation via n-best Reranking (Setiawan, NAACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.naacl-long.72.pdf

PDF Cite Search