Zhuoyang Wu
2026
Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection
Zhuoyang Wu | Xinze Li | Zhenghao Liu | Yukun Yan | Zhiyuan Liu | Minghe Yu | Cheng Yang | Yu Gu | Ge Yu | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2026
Zhuoyang Wu | Xinze Li | Zhenghao Liu | Yukun Yan | Zhiyuan Liu | Minghe Yu | Cheng Yang | Yu Gu | Ge Yu | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) have exhibited strong reasoning capabilities and achieved remarkable performance in mathematical problem-solving tasks. Recently, distilling reasoning ability from long-form Chains-of-Thought (CoTs) has emerged as a promising approach for enhancing Small Language Models (SLMs). Existing studies typically treat SLMs as student models and use long-form CoTs as supervision signals for Supervised Fine-Tuning (SFT) to transfer reasoning ability. However, such long-form CoT teachers are usually unaware of the student model’s capacity, which limits the effective utilization of the provided reasoning traces. To overcome this limitation, we propose error-aware self-reflection (ORION), a framework that refines teacher CoTs through an Error-Aware Reflection process. ORION enables the student model to construct more tailored teacher CoTs by refining teacher CoTs and incorporating its own reasoning errors. Experiments on multiple mathematical reasoning benchmarks demonstrate that ORION consistently improves performance by more than 2% over all baselines. Further analysis reveals that the CoTs constructed by ORION exhibit higher coherence and logical consistency, thereby serving as more effective supervision signals for SFT. All codes are available at https://github.com/NEUIR/ORION.
Long-Chain Reasoning Distillation via Adaptive Prefix Alignment
Zhenghao Liu | Zhuoyang Wu | Xinze Li | Yukun Yan | Shuo Wang | Zulong Chen | Yu Gu | Ge Yu | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhenghao Liu | Zhuoyang Wu | Xinze Li | Yukun Yan | Shuo Wang | Zulong Chen | Yu Gu | Ge Yu | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in solving complex mathematical problems. Recent studies show that distilling long reasoning trajectories can effectively enhance the reasoning performance of small-scale student models. However, teacher-generated reasoning trajectories are often excessively long and structurally complex, making them difficult for student models to learn. This mismatch leads to a gap between the provided supervision signal and the learning capacity of the student model. To address this challenge, we propose Prefix-ALIGNment distillation (P-ALIGN), a framework that fully exploits teacher CoTs for distillation through adaptive prefix alignment. Specifically, P-ALIGN adaptively truncates teacher-generated reasoning trajectories by determining whether the remaining suffix is concise and sufficient to guide the student model. Then, P-ALIGN leverages the teacher-generated prefix to supervise the student model, encouraging effective prefix alignment. Experiments on multiple mathematical reasoning benchmarks demonstrate that P-ALIGN outperforms all baselines by over 3%. Further analysis indicates that the prefixes constructed by P-ALIGN provide more effective supervision signals, while avoiding the negative impact of redundant and uncertain reasoning components. All codes are available at https://github.com/NEUIR/P-ALIGN.