Teaching LLMs to Plan, Not Just Solve: Plan Learning Boosts LLMs Generalization in Reasoning Tasks

Tianlong Wang; Junzhe Chen; Weibin Liao; Xueting Han; Jing Bai

doi:10.18653/v1/2025.findings-emnlp.453

Teaching LLMs to Plan, Not Just Solve: Plan Learning Boosts LLMs Generalization in Reasoning Tasks

Tianlong Wang, Junzhe Chen, Weibin Liao, Xueting Han, Jing Bai

Abstract

Reinforcement learning (RL) on self-generated data has emerged as a promising paradigm for improving reasoning in large language models (LLMs). However, RL relies on accurate reward signals, which are scarce in many domains, making it critical to train models that can generalize to unseen problems. Existing methods often focus on task-specific or domain-specific reasoning, lacking consideration for generalization and may degrade performance on other tasks. To address this, we distinguish between abstract plans, representing high-level problem-solving strategies, and concrete solutions, proposing that learning plans develops transferable general reasoning capabilities and promotes better generalization. Building on this insight, we propose PlanLearn, a framework that combines plan-based search with Step-level Advantage Preference Optimization (Step-APO) to optimize plan learning. Experimental results show that PlanLearn, trained exclusively on GSM8K and MATH, not only significantly improves in-domain performance but also enhances out-of-domain benchmarks, such as HumanEval (+12.2%), GPQA (+8.6%), ARC-C (+4.0%), MMLU-STEM (+2.2%), and BBH (+1.8%). The code is available at https://github.com/tianlwang/PlanLearn.

Anthology ID:: 2025.findings-emnlp.453
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8531–8545
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.453/
DOI:: 10.18653/v1/2025.findings-emnlp.453
Bibkey:
Cite (ACL):: Tianlong Wang, Junzhe Chen, Weibin Liao, Xueting Han, and Jing Bai. 2025. Teaching LLMs to Plan, Not Just Solve: Plan Learning Boosts LLMs Generalization in Reasoning Tasks. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 8531–8545, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Teaching LLMs to Plan, Not Just Solve: Plan Learning Boosts LLMs Generalization in Reasoning Tasks (Wang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.453.pdf
Checklist:: 2025.findings-emnlp.453.checklist.pdf

PDF Cite Search Checklist Fix data