Self-Evolution Knowledge Distillation for LLM-based Machine Translation

Yuncheng Song; Liang Ding; Changtong Zan; Shujian Huang (书剑 黄)

Self-Evolution Knowledge Distillation for LLM-based Machine Translation

Yuncheng Song, Liang Ding, Changtong Zan, Shujian Huang

Abstract

Knowledge distillation (KD) has shown great promise in transferring knowledge from larger teacher models to smaller student models. However, existing KD strategies for large language models often minimize output distributions between student and teacher models indiscriminately for each token. This overlooks the imbalanced nature of tokens and their varying transfer difficulties. In response, we propose a distillation strategy called Self-Evolution KD. The core of this approach involves dynamically integrating teacher distribution and one-hot distribution of ground truth into the student distribution as prior knowledge, which promotes the distillation process. It adjusts the ratio of prior knowledge based on token learning difficulty, fully leveraging the teacher model’s potential. Experimental results show our method brings an average improvement of approximately 1.4 SacreBLEU points across four translation directions in the WMT22 test sets. Further analysis indicates that the improvement comes from better knowledge transfer from teachers, confirming our hypothesis.

Anthology ID:: 2025.coling-main.686
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10298–10308
Language:
URL:: https://aclanthology.org/2025.coling-main.686/
DOI:
Bibkey:
Cite (ACL):: Yuncheng Song, Liang Ding, Changtong Zan, and Shujian Huang. 2025. Self-Evolution Knowledge Distillation for LLM-based Machine Translation. In Proceedings of the 31st International Conference on Computational Linguistics, pages 10298–10308, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Self-Evolution Knowledge Distillation for LLM-based Machine Translation (Song et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.686.pdf

PDF Cite Search Fix data