Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection

Zhuoyang Wu; Xinze Li; Zhenghao Liu (刘正皓); Yukun Yan (闫宇坤); Zhiyuan Liu; Minghe Yu; Cheng Yang; Yu Gu (谷峪); Ge Yu (于戈); Maosong Sun (孙茂松)

Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection

Zhuoyang Wu, Xinze Li, Zhenghao Liu, Yukun Yan, Zhiyuan Liu, Minghe Yu, Cheng Yang, Yu Gu, Ge Yu, Maosong Sun

Abstract

Large Language Models (LLMs) have exhibited strong reasoning capabilities and achieved remarkable performance in mathematical problem-solving tasks. Recently, distilling reasoning ability from long-form Chains-of-Thought (CoTs) has emerged as a promising approach for enhancing Small Language Models (SLMs). Existing studies typically treat SLMs as student models and use long-form CoTs as supervision signals for Supervised Fine-Tuning (SFT) to transfer reasoning ability. However, such long-form CoT teachers are usually unaware of the student model’s capacity, which limits the effective utilization of the provided reasoning traces. To overcome this limitation, we propose error-aware self-reflection (ORION), a framework that refines teacher CoTs through an Error-Aware Reflection process. ORION enables the student model to construct more tailored teacher CoTs by refining teacher CoTs and incorporating its own reasoning errors. Experiments on multiple mathematical reasoning benchmarks demonstrate that ORION consistently improves performance by more than 2% over all baselines. Further analysis reveals that the CoTs constructed by ORION exhibit higher coherence and logical consistency, thereby serving as more effective supervision signals for SFT. All codes are available at https://github.com/NEUIR/ORION.

Anthology ID:: 2026.findings-acl.72
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1465–1482
Language:
URL:: https://aclanthology.org/2026.findings-acl.72/
DOI:
Bibkey:
Cite (ACL):: Zhuoyang Wu, Xinze Li, Zhenghao Liu, Yukun Yan, Zhiyuan Liu, Minghe Yu, Cheng Yang, Yu Gu, Ge Yu, and Maosong Sun. 2026. Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection. In Findings of the Association for Computational Linguistics: ACL 2026, pages 1465–1482, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection (Wu et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.72.pdf
Checklist:: 2026.findings-acl.72.checklist.pdf

PDF Cite Search Checklist Fix data