BERT Learns to Teach: Knowledge Distillation with Meta Learning

Wangchunshu Zhou, Canwen Xu, Julian McAuley


Abstract
We present Knowledge Distillation with Meta Learning (MetaDistil), a simple yet effective alternative to traditional knowledge distillation (KD) methods where the teacher model is fixed during training. We show the teacher network can learn to better transfer knowledge to the student network (i.e., learning to teach) with the feedback from the performance of the distilled student network in a meta learning framework. Moreover, we introduce a pilot update mechanism to improve the alignment between the inner-learner and meta-learner in meta learning algorithms that focus on an improved inner-learner. Experiments on various benchmarks show that MetaDistil can yield significant improvements compared with traditional KD algorithms and is less sensitive to the choice of different student capacity and hyperparameters, facilitating the use of KD on different tasks and models.
Anthology ID:
2022.acl-long.485
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7037–7049
Language:
URL:
https://aclanthology.org/2022.acl-long.485
DOI:
10.18653/v1/2022.acl-long.485
Bibkey:
Cite (ACL):
Wangchunshu Zhou, Canwen Xu, and Julian McAuley. 2022. BERT Learns to Teach: Knowledge Distillation with Meta Learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7037–7049, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
BERT Learns to Teach: Knowledge Distillation with Meta Learning (Zhou et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.485.pdf
Video:
 https://aclanthology.org/2022.acl-long.485.mp4
Code
 JetRunner/MetaDistil
Data
CoLAGLUEMRPCMultiNLIQNLISSTSST-2