Evolutionary Reward Design and Optimization with Multimodal Large Language Models

Ali Narin


Abstract
Designing reward functions is a pivotal yet challenging task for Reinforcement Learning (RL) practices, often demanding domain expertise and substantial effort. Recent studies have explored the utilization of Large Language Models (LLMs) to generate reward functions via evolutionary search techniques. However, these approaches overlook the potential of multimodal information, such as images and videos. In particular, prior methods predominantly rely on numerical feedback from the RL environment for doing evolution, neglecting the incorporation of visual data obtained during training. This study introduces a novel approach by employing Multimodal Large Language Models (MLLMs) to craft reward functions tailored for various RL tasks. The methodology involves providing MLLM with the RL environment’s code alongside its image as context and task information to generate reward candidates. Then, the chosen agent undergoes training, and the numerical feedback from the environment, along with the recorded video of the top-performing policy, is provided as feedback to the MLLM. By employing an iterative feedback mechanism through evolutionary search, MLLM consistently refines the reward function to maximize accuracy. Testing on two different agents points to the preeminence of our approach over previous methodology, which themselves outperformed 83% of reward functions designed by human experts.
Anthology ID:
2024.alvr-1.18
Volume:
Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Jing Gu, Tsu-Jui (Ray) Fu, Drew Hudson, Asli Celikyilmaz, William Wang
Venues:
ALVR | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
202–208
Language:
URL:
https://aclanthology.org/2024.alvr-1.18
DOI:
Bibkey:
Cite (ACL):
Ali Narin. 2024. Evolutionary Reward Design and Optimization with Multimodal Large Language Models. In Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR), pages 202–208, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Evolutionary Reward Design and Optimization with Multimodal Large Language Models (Narin, ALVR-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.alvr-1.18.pdf