AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

Wanpeng Zhang, Zongqing Lu


Abstract
Large Language Models (LLMs) have demonstrated significant success across various domains. However, their application in complex decision-making tasks frequently necessitates intricate prompt engineering or fine-tuning, leading to challenges in unseen downstream tasks and heavy demands on computational resources. Meanwhile, Reinforcement Learning (RL) has been recognized as effective in decision-making problems but struggles in environments with sparse rewards, such as open-world games. To overcome these challenges, we introduce AdaRefiner, a novel framework designed to enhance the synergy between LLMs and RL feedback. The key component of AdaRefiner is a lightweight Adapter Language Model (LM), which automatically refines task comprehension based on feedback from RL agents. This method mitigates the need for intricate prompt engineering and intensive LLM fine-tuning while maintaining the LLMs’ generalization abilities and enhancing their decision-making capabilities in downstream tasks. Empirical evaluations of AdaRefiner on 22 diverse tasks within the open-world game Crafter have demonstrated its superior effectiveness, especially in guiding agents towards higher-level and common-sense skills. Our work makes contributions to the automatic self-refinement of LLMs with RL feedback, offering a more adaptable and efficient solution for complex decision-making problems. The code is available at https://github.com/PKU-RL/AdaRefiner.
Anthology ID:
2024.findings-naacl.50
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
782–799
Language:
URL:
https://aclanthology.org/2024.findings-naacl.50
DOI:
Bibkey:
Cite (ACL):
Wanpeng Zhang and Zongqing Lu. 2024. AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 782–799, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback (Zhang & Lu, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.50.pdf
Copyright:
 2024.findings-naacl.50.copyright.pdf