Kourosh Hakhamaneshi
2025
Language Models Can Easily Learn to Reason from Demonstrations
Dacheng Li
|
Shiyi Cao
|
Tyler Griggs
|
Shu Liu
|
Xiangxi Mo
|
Eric Tang
|
Sumanth Hegde
|
Kourosh Hakhamaneshi
|
Shishir G Patil
|
Matei Zaharia
|
Joseph E. Gonzalez
|
Ion Stoica
Findings of the Association for Computational Linguistics: EMNLP 2025
Large reasoning models (LRMs) tackle complex problems by following long chain-of-thoughts (Long CoT) that incorporate reflection, backtracking, and self-validation. However, the training techniques and data requirements to elicit Long CoT remain poorly understood. In this work, we find that language models can effectively learn Long CoT reasoning through data-efficient supervised fine-tuning (SFT) and further parameter-efficient low-rank adaptation (LoRA). Crucially, we find that the structure of Long CoT is critical to the learning process in this data-efficient fine-tuning process. Training on content-incorrect examples, e.g. those lead to incorrect answers or corrupted digits, still leads to significant performance gains. In contrast, training on structurally incorrect examples, e.g., with shuffled or deleted reasoning steps, yield smaller improvements or even degrade performance.
Search
Fix author
Co-authors
- Shiyi Cao 1
- Joseph E. Gonzalez 1
- Tyler Griggs 1
- Sumanth Hegde 1
- Dacheng Li 1
- show all...