Accurate Knowledge Distillation via n-best Reranking

Hendra Setiawan


Abstract
We propose utilizing n-best reranking to enhance Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we extract pseudo-labels for student model’s training data from top n-best hypotheses and leverage a diverse set of models with different inductive biases, objective functions or architectures, including some publicly-available large language models, to pick the highest-quality hypotheses as labels. The effectiveness of our proposal is validated through experiments on the WMT’21 German ↔ English and Chinese ↔ English translation tasks. Our results demonstrate that utilizing pseudo-labels generated by our n-best reranker leads to a significantly more accurate student model. In fact, our best student model achieves comparable accuracy to a large translation model from (Tran et al., 2021) with 4.7 billion parameters, while having two orders of magnitude fewer parameters.
Anthology ID:
2024.naacl-long.72
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1330–1345
Language:
URL:
https://aclanthology.org/2024.naacl-long.72
DOI:
Bibkey:
Cite (ACL):
Hendra Setiawan. 2024. Accurate Knowledge Distillation via n-best Reranking. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1330–1345, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Accurate Knowledge Distillation via n-best Reranking (Setiawan, NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.72.pdf
Copyright:
 2024.naacl-long.72.copyright.pdf