LLaMA-Rider: Spurring Large Language Models to Explore the Open World

Yicheng Feng, Yuxuan Wang, Jiazheng Liu, Sipeng Zheng, Zongqing Lu


Abstract
Recently, various studies have leveraged Large Language Models (LLMs) to help decision-making and planning in environments and try to align the LLMs’ knowledge with the world conditions. Nonetheless, the capacity of LLMs to continuously acquire environmental knowledge and adapt in an open world remains uncertain. In this paper, we propose an approach to spur LLMs to explore the open world, gather experiences, and learn to improve their task-solving capabilities. In this approach, a multi-round feedback-revision mechanism is utilized to encourage LLMs to actively select appropriate revision actions guided by feedback information from the environment. This facilitates exploration and enhances the model’s performance. Besides, we integrate sub-task relabeling to assist LLMs in maintaining consistency in sub-task planning and help the model learn the combinatorial nature between tasks, enabling it to complete a wider range of tasks through training based on the acquired exploration experiences. By evaluation in Minecraft, an open-ended sandbox world, we demonstrate that our approach LLaMA-Rider enhances the efficiency of the LLM in exploring the environment, and effectively improves the LLM’s ability to accomplish more tasks through fine-tuning with merely 1.3k instances of collected data, showing minimal training costs compared to the baseline using reinforcement learning. The code is available at https://github.com/PKU-RL/LLaMA-Rider.
Anthology ID:
2024.findings-naacl.292
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4705–4724
Language:
URL:
https://aclanthology.org/2024.findings-naacl.292
DOI:
10.18653/v1/2024.findings-naacl.292
Bibkey:
Cite (ACL):
Yicheng Feng, Yuxuan Wang, Jiazheng Liu, Sipeng Zheng, and Zongqing Lu. 2024. LLaMA-Rider: Spurring Large Language Models to Explore the Open World. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 4705–4724, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
LLaMA-Rider: Spurring Large Language Models to Explore the Open World (Feng et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.292.pdf