The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?

Yutao Sun; Mingshuai Chen; Tiancheng Zhao; Ruochen Xu; Zilun Zhang; Jianwei Yin

doi:10.18653/v1/2025.findings-acl.337

The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?

Yutao Sun, Mingshuai Chen, Tiancheng Zhao, Ruochen Xu, Zilun Zhang, Jianwei Yin

Abstract

Self-improving large language models (LLMs) – i.e., to improve the performance of an LLM by fine-tuning it with synthetic data generated by itself – is a promising way to advance the capabilities of LLMs while avoiding extensive supervision. Existing approaches to self-improvement often rely on external supervision signals in the form of seed data and/or assistance from third-party models. This paper presents Crescent – a simple yet effective framework for generating high-quality synthetic question-answer data in a fully autonomous manner. Crescent first elicits the LLM to generate raw questions via a bait prompt, then diversifies these questions leveraging a rejection sampling-based self-deduplication, and finally feeds the questions to the LLM and collects the corresponding answers by means of majority voting. We show that Crescent sheds light on the potential of true self-improvement with zero external supervision signals for math reasoning; in particular, Crescent-generated question-answer pairs suffice to (i) improve the reasoning capabilities of an LLM while preserving its general performance (especially in the 0-shot setting); and (ii) distill LLM knowledge to weaker models more effectively than existing methods based on seed-dataset augmentation.

Anthology ID:: 2025.findings-acl.337
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6501–6512
Language:
URL:: https://aclanthology.org/2025.findings-acl.337/
DOI:: 10.18653/v1/2025.findings-acl.337
Bibkey:
Cite (ACL):: Yutao Sun, Mingshuai Chen, Tiancheng Zhao, Ruochen Xu, Zilun Zhang, and Jianwei Yin. 2025. The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?. In Findings of the Association for Computational Linguistics: ACL 2025, pages 6501–6512, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding? (Sun et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.337.pdf

PDF Cite Search Fix data