Learning from Diverse Reasoning Paths with Routing and Collaboration

Zhenyu Lei; Zhen Tan; Song Wang; Yaochen Zhu; Zihan Chen; Yushun Dong; Jundong Li

doi:10.18653/v1/2025.emnlp-main.141

Learning from Diverse Reasoning Paths with Routing and Collaboration

Zhenyu Lei, Zhen Tan, Song Wang, Yaochen Zhu, Zihan Chen, Yushun Dong, Jundong Li

Abstract

Advances in large language models (LLMs) significantly enhance reasoning capabilities but their deployment is restricted in resource-constrained scenarios. Knowledge distillation addresses this by transferring knowledge from powerful teacher models to compact and transparent students.However, effectively capturing the teacher’s comprehensive reasoning is challenging due to conventional token-level supervision’s limited scope. Using multiple reasoning paths per query alleviates this problem, but treating each path identically is suboptimal as paths vary widely in quality and suitability across tasks and models.We propose Quality-filtered Routing with Cooperative Distillation(QR-Distill), combining path quality filtering, conditional routing, and cooperative peer teaching. First, quality filtering retains only correct reasoning paths scored by an LLM-based evaluation. Second, conditional routing dynamically assigns paths tailored to each student’s current learning state. Finally, cooperative peer teaching enables students to mutually distill diverse insights, addressing knowledge gaps and biases toward specific reasoning styles. Experiments demonstrate QR-Distill’s superiority over traditional single- and multi-path distillation methods. Ablation studies further highlight the importance of each component—quality filtering, conditional routing, and peer teaching—in effective knowledge transfer. Our code is available at https://github.com/LzyFischer/Distill.

Anthology ID:: 2025.emnlp-main.141
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2832–2845
Language:
URL:: https://aclanthology.org/2025.emnlp-main.141/
DOI:: 10.18653/v1/2025.emnlp-main.141
Bibkey:
Cite (ACL):: Zhenyu Lei, Zhen Tan, Song Wang, Yaochen Zhu, Zihan Chen, Yushun Dong, and Jundong Li. 2025. Learning from Diverse Reasoning Paths with Routing and Collaboration. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 2832–2845, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Learning from Diverse Reasoning Paths with Routing and Collaboration (Lei et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.141.pdf
Checklist:: 2025.emnlp-main.141.checklist.pdf

PDF Cite Search Checklist Fix data