Teaching Text Agents to Learn Sequential Decision Making from Failure

Canasai Kruengkrai; Koichiro Yoshino

doi:10.18653/v1/2025.acl-long.1526

Teaching Text Agents to Learn Sequential Decision Making from Failure

Abstract

Text-based reinforcement-learning agents improve their policies by interacting with their environments to collect more training data. However, these self-collected data inevitably contain intermediate failed actions caused by attempting physically infeasible behaviors and/or hallucinations. Directly learning a policy from such trajectories can reinforce incorrect behaviors and reduce task success rates. In this paper, we propose a failed action-aware objective that suppresses the negative impact of failed actions during training by assigning zero return based on textual feedback. Building on this objective, we introduce a perturbation method that leverages unsuccessful trajectories to construct new successful ones that share the same goal. This allows agents to benefit from diverse experiences without further interaction with the environment. Experiments in ALFWorld and ScienceWorld demonstrate that our method significantly outperforms strong baselines and generalizes across environments. Code is available at https://github.com/riken-grp/text-agent.

Anthology ID:: 2025.acl-long.1526
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31619–31635
Language:
URL:: https://aclanthology.org/2025.acl-long.1526/
DOI:: 10.18653/v1/2025.acl-long.1526
Bibkey:
Cite (ACL):: Canasai Kruengkrai and Koichiro Yoshino. 2025. Teaching Text Agents to Learn Sequential Decision Making from Failure. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 31619–31635, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Teaching Text Agents to Learn Sequential Decision Making from Failure (Kruengkrai & Yoshino, ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1526.pdf

PDF Cite Search Fix data