SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility

Xuyang Zhi; Peilun Zhou; Chengqiang Lu; Hang Lv; Yiwei Liang; Rongyang Zhang; Yan Gao; Yiwu; Yao Hu; Hongchao Gu; Defu Lian; Hao Wang; Enhong Chen

SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility

Xuyang Zhi, Peilun Zhou, Chengqiang Lu, Hang Lv, Yiwei Liang, Rongyang Zhang, Yan Gao, Yiwu, Yao Hu, Hongchao Gu, Defu Lian, Hao Wang, Enhong Chen

Abstract

The evolution of Large Language Models (LLMs) is shifting the focus from single, verifiable tasks toward complex, open-ended real-world scenarios, imposing significant challenges on the post-training phase. In these settings, the scale and complexity of reward systems have grown significantly, transitioning toward multi-objective formulations that encompass a comprehensive spectrum of model capabilities and application contexts. However, traditional methods typically rely on fixed reward weights, ignoring non-stationary learning dynamics and struggling with data heterogeneity across dimensions. To address these issues, we propose SPARD, a framework that establishes an automated, self-paced curriculum by perceiving learning progress to dynamically adjust multi-objective reward weights and data importance, thereby synchronizing learning intent with data utility for optimal performance. Extensive experiments across multiple benchmarks demonstrate that SPARD significantly enhances model capabilities across all domains. Our code is publicly available at https://github.com/USTC-StarTeam/SPARD.

Anthology ID:: 2026.acl-long.2191
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 47402–47422
Language:
URL:: https://aclanthology.org/2026.acl-long.2191/
DOI:
Bibkey:
Cite (ACL):: Xuyang Zhi, Peilun Zhou, Chengqiang Lu, Hang Lv, Yiwei Liang, Rongyang Zhang, Yan Gao, Yiwu, Yao Hu, Hongchao Gu, Defu Lian, Hao Wang, and Enhong Chen. 2026. SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 47402–47422, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility (Zhi et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.2191.pdf
Checklist:: 2026.acl-long.2191.checklist.pdf

PDF Cite Search Checklist Fix data