RLAE: Reinforcement Learning-Assisted Ensemble for LLMs

Yuqian Fu; Yuanheng Zhu; Jiajun Chai; Guojun Yin; Wei Lin; Qichao Zhang; Dongbin Zhao

doi:10.18653/v1/2025.emnlp-main.680

RLAE: Reinforcement Learning-Assisted Ensemble for LLMs

Yuqian Fu, Yuanheng Zhu, Jiajun Chai, Guojun Yin, Wei Lin, Qichao Zhang, Dongbin Zhao

Abstract

Ensembling large language models (LLMs) can effectively combine diverse strengths of different models, offering a promising approach to enhance performance across various tasks. However, existing methods typically rely on fixed weighting strategies that fail to adapt to the dynamic, context-dependent characteristics of LLM capabilities. In this work, we propose **R**einforcement **L**earning-**A**ssisted **E**nsemble for LLMs (RLAE), a novel framework that reformulates LLM ensemble through the lens of a Markov Decision Process (MDP). Our approach introduces a RL agent that dynamically adjusts ensemble weights by considering both input context and intermediate generation states, with the agent being trained using rewards that directly correspond to the quality of final outputs. We implement RLAE using both single-agent and multi-agent reinforcement learning algorithms (RLAE_PPO and RLAE_MAPPO ), demonstrating substantial improvements over conventional ensemble methods. Extensive evaluations on a diverse set of tasks show that RLAE outperforms existing approaches by up to 3.3\\% accuracy points, offering a more effective framework for LLM ensembling. Furthermore, our method exhibits superior generalization capabilities across different tasks without the need for retraining, while simultaneously achieving lower time latency. The source code is available at here.

Anthology ID:: 2025.emnlp-main.680
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13452–13466
Language:
URL:: https://aclanthology.org/2025.emnlp-main.680/
DOI:: 10.18653/v1/2025.emnlp-main.680
Bibkey:
Cite (ACL):: Yuqian Fu, Yuanheng Zhu, Jiajun Chai, Guojun Yin, Wei Lin, Qichao Zhang, and Dongbin Zhao. 2025. RLAE: Reinforcement Learning-Assisted Ensemble for LLMs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 13452–13466, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: RLAE: Reinforcement Learning-Assisted Ensemble for LLMs (Fu et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.680.pdf
Checklist:: 2025.emnlp-main.680.checklist.pdf

PDF Cite Search Checklist Fix data