Adaptive Contrastive Knowledge Distillation for BERT Compression

Jinyang Guo, Jiaheng Liu, Zining Wang, Yuqing Ma, Ruihao Gong, Ke Xu, Xianglong Liu


Abstract
In this paper, we propose a new knowledge distillation approach called adaptive contrastive knowledge distillation (ACKD) for BERT compression. Different from existing knowledge distillation methods for BERT that implicitly learn discriminative student features by mimicking the teacher features, we first introduce a novel contrastive distillation loss (CDL) based on hidden state features in BERT as the explicit supervision to learn discriminative student features. We further observe sentences with similar features may have completely different meanings, which makes them hard to distinguish. Existing methods do not pay sufficient attention to these hard samples with less discriminative features. Therefore, we propose a new strategy called sample adaptive reweighting (SAR) to adaptively pay more attention to these hard samples and strengthen their discrimination abilities. We incorporate our SAR strategy into our CDL and form the adaptive contrastive distillation loss, based on which we construct our ACKD framework. Comprehensive experiments on multiple natural language processing tasks demonstrate the effectiveness of our ACKD framework.
Anthology ID:
2023.findings-acl.569
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8941–8953
Language:
URL:
https://aclanthology.org/2023.findings-acl.569
DOI:
10.18653/v1/2023.findings-acl.569
Bibkey:
Cite (ACL):
Jinyang Guo, Jiaheng Liu, Zining Wang, Yuqing Ma, Ruihao Gong, Ke Xu, and Xianglong Liu. 2023. Adaptive Contrastive Knowledge Distillation for BERT Compression. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8941–8953, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Adaptive Contrastive Knowledge Distillation for BERT Compression (Guo et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.569.pdf