Breaking the Impasse: Dual-Scale Evolutionary Policy Training for Social Language Agents

Minzheng Wang; Run Luo; Yanbo Wang; Zichen Liu; Yuqiao Tan; Tao Tan; Nan Xu; Lu Wang; Wenji Mao

Breaking the Impasse: Dual-Scale Evolutionary Policy Training for Social Language Agents

Minzheng Wang, Run Luo, Yanbo Wang, Zichen Liu, Yuqiao Tan, Tao Tan, Nan Xu, Lu Wang, Wenji Mao

Abstract

While Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for closed-ended tasks, extending it to open-ended social language games via self-play reveals a critical issue: evolution impasse. Due to the vast strategy space, language agents frequently converge to homogenized behaviors, leading to deterministic match outcomes that eliminate the gradient signals necessary for policy evolution. To tackle this issue, we propose Dual-scale Evolutionary Policy Training (DEPT) for social language games. DEPT introduces a time-scaled evolutionary perception mechanism that detects impasse by quantifying dual-scale value baseline divergence alongside match entropy. Upon perceiving the collapse, it then activates asymmetric advantage reshaping to dynamically modulate the optimization landscape for intervention. Thus, our method effectively restores gradient signals and enforces sustained strategic exploration. Extensive experiments on multiple social language games demonstrate that DEPT outperforms strong baselines, avoiding policy degeneration and driving the continuous evolution of social language agents.

Anthology ID:: 2026.acl-long.2096
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 45213–45235
Language:
URL:: https://aclanthology.org/2026.acl-long.2096/
DOI:
Bibkey:
Cite (ACL):: Minzheng Wang, Run Luo, Yanbo Wang, Zichen Liu, Yuqiao Tan, Tao Tan, Nan Xu, Lu Wang, and Wenji Mao. 2026. Breaking the Impasse: Dual-Scale Evolutionary Policy Training for Social Language Agents. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 45213–45235, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Breaking the Impasse: Dual-Scale Evolutionary Policy Training for Social Language Agents (Wang et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.2096.pdf
Checklist:: 2026.acl-long.2096.checklist.pdf

PDF Cite Search Checklist Fix data