STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

Shreyas Basavatia; Keerthiram Murugesan; Shivam Ratnakar

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

Shreyas Basavatia, Keerthiram Murugesan, Shivam Ratnakar

Abstract

Interactive fiction games have emerged as an important application to improve the generalization capabilities of language-based reinforcement learning (RL) agents. Existing environments for interactive fiction games are domain-specific or time-consuming to generate and do not train the RL agents to master a specific set of skills. In this work, we introduce an interactive environment for self-supervised RL, STARLING, for text-based games that bootstraps the text-based RL agents with automatically generated games (based on the seed set of game ideas) to boost the performance and generalization capabilities to reach a goal of the target environment. These games let the agent hone their skills on a predefined set of tasks. We create and test an environment with 100 games, generated using this automated framework that uses large language models (GPT3) and an interactive fiction game engine (based on Inform7) to provide the user with the ability to generate more games under minimal human supervision. Experimental results based on both the human participants and baseline text-based RL agents reveal that current state-of-the-art text-based RL agents cannot use previously learned skills in new situations at the level humans can. These results enforce STARLING’s potential to serve as a sandbox environment for further research in self-supervised text-based RL.

Anthology ID:: 2024.findings-acl.935
Volume:: Findings of the Association for Computational Linguistics ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand and virtual meeting
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15804–15819
Language:
URL:: https://aclanthology.org/2024.findings-acl.935
DOI:
Bibkey:
Cite (ACL):: Shreyas Basavatia, Keerthiram Murugesan, and Shivam Ratnakar. 2024. STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models. In Findings of the Association for Computational Linguistics ACL 2024, pages 15804–15819, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models (Basavatia et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.935.pdf

PDF Cite Search