Through the Valley: Path to Effective Long CoT Training for Small Language Models

Renjie Luo; Jiaxi Li; Chen Huang; Wei Lu

doi:10.18653/v1/2025.emnlp-main.251

Through the Valley: Path to Effective Long CoT Training for Small Language Models

Renjie Luo, Jiaxi Li, Chen Huang, Wei Lu

Abstract

Long chain-of-thought (CoT) supervision has become a common strategy to enhance reasoning in language models. While effective for large models, we identify a phenomenon we call Long CoT Degradation, in which small language models (SLMs; ≤3B parameters) trained on limited long CoT data experience significant performance deterioration. Through extensive experiments on the Qwen2.5, LLaMA3 and Gemma3 families, we demonstrate that this degradation is widespread across SLMs. In some settings, models trained on only 8k long CoT examples lose up to 75% of their original performance before fine-tuning. Strikingly, we further observe that for some particularly small models, even training on 220k long CoT examples fails to recover or surpass their original performance prior to fine-tuning. Our analysis attributes this effect to error accumulation: while longer responses increase the capacity for multi-step reasoning, they also amplify the risk of compounding mistakes. Furthermore, we find that Long CoT Degradation may negatively impacts downstream reinforcement learning (RL), although this can be alleviated by sufficiently scaled supervised fine-tuning (SFT). Our findings challenge common assumptions about the benefits of long CoT training for SLMs and offer practical guidance for building more effective small-scale reasoning models.

Anthology ID:: 2025.emnlp-main.251
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4972–4992
Language:
URL:: https://aclanthology.org/2025.emnlp-main.251/
DOI:: 10.18653/v1/2025.emnlp-main.251
Bibkey:
Cite (ACL):: Renjie Luo, Jiaxi Li, Chen Huang, and Wei Lu. 2025. Through the Valley: Path to Effective Long CoT Training for Small Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4972–4992, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Through the Valley: Path to Effective Long CoT Training for Small Language Models (Luo et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.251.pdf
Checklist:: 2025.emnlp-main.251.checklist.pdf

PDF Cite Search Checklist Fix data