A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions

Zhiyin Yu; Yuchen Mou; Juncheng Yan; Junyu Luo; Chunchun Chen; Xing Wei; Yunhui Liu; Hongru Sun; Yuxing Zhang; Jun Xu; Yatao Bian; Ming Zhang; Wei Ye; Tieke He; Jie Yang; Guanjie Zheng; Zhonghai Wu; Bo Zhang; Lei Bai; Xiao Luo

A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions

Zhiyin Yu, Yuchen Mou, Juncheng Yan, Junyu Luo, Chunchun Chen, Xing Wei, Yunhui Liu, Hongru Sun, Yuxing Zhang, Jun Xu, Yatao Bian, Ming Zhang, Wei Ye, Tieke He, Jie Yang, Guanjie Zheng, Zhonghai Wu, Bo Zhang, Lei Bai, Xiao Luo

Abstract

Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.

Anthology ID:: 2026.acl-long.1045
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 22823–22846
Language:
URL:: https://aclanthology.org/2026.acl-long.1045/
DOI:
Bibkey:
Cite (ACL):: Zhiyin Yu, Yuchen Mou, Juncheng Yan, Junyu Luo, Chunchun Chen, Xing Wei, Yunhui Liu, Hongru Sun, Yuxing Zhang, Jun Xu, Yatao Bian, Ming Zhang, Wei Ye, Tieke He, Jie Yang, Guanjie Zheng, Zhonghai Wu, Bo Zhang, Lei Bai, and Xiao Luo. 2026. A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 22823–22846, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions (Yu et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1045.pdf
Checklist:: 2026.acl-long.1045.checklist.pdf

PDF Cite Search Checklist Fix data