One-Teacher and Multiple-Student Knowledge Distillation on Sentiment Classification

Xiaoqin Chang; Sophia Yat Mei Lee; Suyang Zhu; Shoushan Li (李寿山); Guodong Zhou (周国栋)

One-Teacher and Multiple-Student Knowledge Distillation on Sentiment Classification

Xiaoqin Chang, Sophia Yat Mei Lee, Suyang Zhu, Shoushan Li, Guodong Zhou

Abstract

Knowledge distillation is an effective method to transfer knowledge from a large pre-trained teacher model to a compacted student model. However, in previous studies, the distilled student models are still large and remain impractical in highly speed-sensitive systems (e.g., an IR system). In this study, we aim to distill a deep pre-trained model into an extremely compacted shallow model like CNN. Specifically, we propose a novel one-teacher and multiple-student knowledge distillation approach to distill a deep pre-trained teacher model into multiple shallow student models with ensemble learning. Moreover, we leverage large-scale unlabeled data to improve the performance of students. Empirical studies on three sentiment classification tasks demonstrate that our approach achieves better results with much fewer parameters (0.9%-18%) and extremely high speedup ratios (100X-1000X).

Anthology ID:: 2022.coling-1.614
Volume:: Proceedings of the 29th International Conference on Computational Linguistics
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Editors:: Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 7042–7052
Language:
URL:: https://aclanthology.org/2022.coling-1.614/
DOI:
Bibkey:
Cite (ACL):: Xiaoqin Chang, Sophia Yat Mei Lee, Suyang Zhu, Shoushan Li, and Guodong Zhou. 2022. One-Teacher and Multiple-Student Knowledge Distillation on Sentiment Classification. In Proceedings of the 29th International Conference on Computational Linguistics, pages 7042–7052, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):: One-Teacher and Multiple-Student Knowledge Distillation on Sentiment Classification (Chang et al., COLING 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.coling-1.614.pdf
Code: strive-hhh/otms-kd

PDF Cite Search Code Fix data