MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning

Chanwoo Park; Seungju Han; Xingzhi Guo; Asuman E. Ozdaglar; Kaiqing Zhang; Joo-Kyung Kim

doi:10.18653/v1/2025.acl-long.1459

MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning

Chanwoo Park, Seungju Han, Xingzhi Guo, Asuman E. Ozdaglar, Kaiqing Zhang, Joo-Kyung Kim

Abstract

Leveraging multi-agentic frameworks to enhance large language models (LLMs) has demonstrated significant potential recently, with most existing studies focusing on prompting and developing workflows with frozen LLMs. In this paper, we aim to further unleash the power of such multi-agentic frameworks for post-training LLMs for better collaboration. Specifically, we develop a new paradigm of Multi-Agent Post-co-training for collaborative LLMs with Reinforcement Learning (MAPoRL). In MAPoRL, multiple LLMs first generate their own responses and engage in discussions to collaboratively enhance the final response output; the final output is then scored by a verifier, where the scores serve as the reward and is maximized through multi-agent RL. Additionally, MAPoRL also reshapes the reward above with additional incentives to encourage corrective and persuasive outputs in the discussions. A key novelty from most existing LLM post-training paradigms is the advocacy of co-training multiple LLMs together, and the use of RL for better generalization. Accompanied by a few analytical insights, our experiments show that training single LLMs solely is insufficient for encouraging collaboration, while multi-agent co-training can significantly enhance the collaboration performance across multiple datasets, with generalization to unseen domains, compared to that of multiple LLMs before post-training.

Anthology ID:: 2025.acl-long.1459
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 30215–30248
Language:
URL:: https://aclanthology.org/2025.acl-long.1459/
DOI:: 10.18653/v1/2025.acl-long.1459
Bibkey:
Cite (ACL):: Chanwoo Park, Seungju Han, Xingzhi Guo, Asuman E. Ozdaglar, Kaiqing Zhang, and Joo-Kyung Kim. 2025. MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30215–30248, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning (Park et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1459.pdf

PDF Cite Search Fix data