Xingyu Bai
2025
Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs
Taiqiang Wu
|
Zhe Zhao
|
Jiahao Wang
|
Xingyu Bai
|
Lei Wang
|
Ngai Wong
|
Yujiu Yang
Proceedings of the 31st International Conference on Computational Linguistics
Distilling high-accuracy Graph Neural Networks (GNNs) to low-latency multilayer perceptrons (MLPs) on graph tasks has become a hot research topic. However, conventional MLP learning relies almost exclusively on graph nodes and fails to effectively capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose Prototype-Guided Knowledge Distillation (PGKD), which does not require graph edges (edge-free setting) yet learns structure-aware MLPs. Our insight is to distill graph structural information from GNNs. Specifically, we first employ the class prototypes to analyze the impact of graph structures on GNN teachers, and then design two losses to distill such information from GNNs to MLPs. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD.
2022
Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching
Kunbo Ding
|
Weijie Liu
|
Yuejian Fang
|
Zhe Zhao
|
Qi Ju
|
Xuefeng Yang
|
Rong Tian
|
Zhu Tao
|
Haoyan Liu
|
Han Guo
|
Xingyu Bai
|
Weiquan Mao
|
Yudong Li
|
Weigang Guo
|
Taiqiang Wu
|
Ningyuan Sun
Findings of the Association for Computational Linguistics: NAACL 2022
Previous studies have proved that cross-lingual knowledge distillation can significantly improve the performance of pre-trained models for cross-lingual similarity matching tasks. However, the student model needs to be large in this operation. Otherwise, its performance will drop sharply, thus making it impractical to be deployed to memory-limited devices. To address this issue, we delve into cross-lingual knowledge distillation and propose a multi-stage distillation framework for constructing a small-size but high-performance cross-lingual model. In our framework, contrastive learning, bottleneck, and parameter recurrent strategies are delicately combined to prevent performance from being compromised during the compression process. The experimental results demonstrate that our method can compress the size of XLM-R and MiniLM by more than 50%, while the performance is only reduced by about 1%.