ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models

Jianyi Zhang, Aashiq Muhamed, Aditya Anantharaman, Guoyin Wang, Changyou Chen, Kai Zhong, Qingjun Cui, Yi Xu, Belinda Zeng, Trishul Chilimbi, Yiran Chen


Abstract
Knowledge Distillation (KD) is one of the most effective approaches to deploying large-scale pre-trained language models in low-latency environments by transferring the knowledge contained in the large-scale models to smaller student models. Prior KD approaches use the soft labels and intermediate activations generated by the teacher to transfer knowledge to the student model parameters alone. In this paper, we show that having access to non-parametric memory in the form of a knowledge base with the teacher’s soft labels and predictions can further improve student generalization. To enable the student to retrieve from the knowledge base effectively, we propose a new framework and loss function that preserves the semantic similarities of teacher and student training examples. We show through extensive experiments that our retrieval mechanism can achieve state-of-the-art performance for task-specific knowledge distillation on the GLUE benchmark.
Anthology ID:
2023.acl-short.97
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1128–1136
Language:
URL:
https://aclanthology.org/2023.acl-short.97
DOI:
10.18653/v1/2023.acl-short.97
Bibkey:
Cite (ACL):
Jianyi Zhang, Aashiq Muhamed, Aditya Anantharaman, Guoyin Wang, Changyou Chen, Kai Zhong, Qingjun Cui, Yi Xu, Belinda Zeng, Trishul Chilimbi, and Yiran Chen. 2023. ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1128–1136, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models (Zhang et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-short.97.pdf
Video:
 https://aclanthology.org/2023.acl-short.97.mp4