Yuhua Wang
2024
Self-Knowledge Distillation for Knowledge Graph Embedding
Haotian Xu
|
Yuhua Wang
|
Jiahui Fan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Knowledge graph embedding (KGE) is an important task and it can benefit lots of downstream applications. General KGE can increase the embedding dimension to improve model performance. High-dimensional KGE will significantly increase the number of model parameters and training time. Therefore, knowledge distillation is proposed for learning a low-dimensional model from a pre-trained high-dimensional model. To avoid introducing a complex teacher model, we use self-knowledge distillation. However, there are still some issues with the self-knowledge distillation model we mentioned later. One of them is misdirection from incorrect predictions during model training. Another is the loss of discrimination information caused by excessive distillation temperature. To address these issues, we apply self-knowledge distillation, knowledge adjustment and dynamic temperature distillation to KGE. Self-knowledge distillation uses the information from the latest iteration to guide the training in the current iteration. Knowledge adjustment fixes the predictions of misjudged training samples. Dynamic temperature distillation designs dynamic sample-wise temperatures to compute soft targets. Our model can not only improve model performance but also achieve a lightweight model. Experimental results demonstrate the effectiveness and generalization ability of our model in link prediction. The lightweight model can maintain good model performance while reducing the number of model parameters and training time.
Search