On the Hidden Objective Biases of Group-based Reinforcement Learning

Aleksandar Fontana; Marco Simoni; Giulio Rossolini; Paolo Mori; Andrea Saracino

On the Hidden Objective Biases of Group-based Reinforcement Learning

Aleksandar Fontana, Marco Simoni, Giulio Rossolini, Paolo Mori, Andrea Saracino

Abstract

Group-based reinforcement learning methods, like Group Relative Policy Optimization (GRPO), are widely used nowadays to post-train large language models. Despite their empirical success, they exhibit structural mismatches between reward optimization and the underlying training objective. In this paper, we present a theoretical analysis of GRPO style methods by studying them within a unified surrogate formulation. This perspective reveals recurring properties that affect all the methods under analysis: (i) non-uniform group weighting induces systematic gradient biases on shared prefix tokens; (ii) interactions with the AdamW optimizer make training dynamics largely insensitive to reward scaling; and (iii) optimizer momentum can push policy updates beyond the intended clipping region under repeated optimization steps. We believe that these findings highlight fundamental limitations of current approaches and provide principled guidance for the design of future formulations.

Anthology ID:: 2026.acl-short.11
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 109–121
Language:
URL:: https://aclanthology.org/2026.acl-short.11/
DOI:
Bibkey:
Cite (ACL):: Aleksandar Fontana, Marco Simoni, Giulio Rossolini, Paolo Mori, and Andrea Saracino. 2026. On the Hidden Objective Biases of Group-based Reinforcement Learning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 109–121, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: On the Hidden Objective Biases of Group-based Reinforcement Learning (Fontana et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-short.11.pdf
Checklist:: 2026.acl-short.11.checklist.pdf

PDF Cite Search Checklist Fix data