基于中间层对齐的异构师生模型知识蒸馏(Knowledge distillation of heterogeneous teacher-student model with intermediate layer loss)

Zhai Feiyan (翟飞燕), Wang Renzhi (王任之), Li Piji (李丕绩)


Abstract
“知识蒸馏技术作为大语言模型时代的一项前沿模型压缩策略,通过将复杂模型的知识有效迁移至简单模型,显著降低了模型的参数规模和计算成本。尽管如此,目前主流的生成式大语言模型蒸馏算法主要集中于优化师生模型间的最后输出层损失,而忽视了对模型中间层的探索。此外,针对中间层蒸馏的研究往往对师生模型的结构一致性有着严格的要求,无法处理异构模型间的蒸馏问题,从而存在明显的局限性。针对这些问题,我们提出了一种新的知识蒸馏算法:引入了中间层蒸馏损失的异构生成式师生大语言模型知识蒸馏算法。该算法首先提取师生模型的中间层信息作为蒸馏对象,随后通过专门设计的中间层映射规则和对齐模块,实现异构模型间基于中间层的知识对齐与损失计算。最后,联合优化各个蒸馏损失的比例。通过在五个相关数据集上进行实验验证,我们的方法在提高蒸馏效果方面展现出显著优势。”
Anthology ID:
2024.ccl-1.71
Volume:
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:
July
Year:
2024
Address:
Taiyuan, China
Editors:
Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
910–928
Language:
Chinese
URL:
https://aclanthology.org/2024.ccl-1.71/
DOI:
Bibkey:
Cite (ACL):
Zhai Feiyan, Wang Renzhi, and Li Piji. 2024. 基于中间层对齐的异构师生模型知识蒸馏(Knowledge distillation of heterogeneous teacher-student model with intermediate layer loss). In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 910–928, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):
基于中间层对齐的异构师生模型知识蒸馏(Knowledge distillation of heterogeneous teacher-student model with intermediate layer loss) (Feiyan et al., CCL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ccl-1.71.pdf