Language Model Adaption for Reinforcement Learning with Natural Language Action Space

Jiangxing Wang, Jiachen Li, Xiao Han, Deheng Ye, Zongqing Lu


Abstract
Reinforcement learning with natural language action space often suffers from the curse of dimensionality due to the combinatorial nature of the natural language. Previous research leverages pretrained language models to capture action semantics and reduce the size of the action space. However, since pretrained models are typically trained on general corpora, there can be an unpredictable mismatch between the priors encoded in pretrained models and the characteristics of the specific RL environment. To address this issue, we propose Mutual-Information Regularized Policy Optimization, MIPO. MIPO enables implicit and dynamic reduction of the action space. Starting from the prior provided by the pretrained language model, our method dynamically adjusts the prior during the learning process based on the guidance of mutual information regularization. Theoretically, we demonstrate that this policy optimization process leads to the monotonic improvement on the mutual-information regularized RL objective. Empirically, we conduct experiments in various environments and demonstrate the effectiveness of MIPO.
Anthology ID:
2024.acl-long.89
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1620–1634
Language:
URL:
https://aclanthology.org/2024.acl-long.89
DOI:
Bibkey:
Cite (ACL):
Jiangxing Wang, Jiachen Li, Xiao Han, Deheng Ye, and Zongqing Lu. 2024. Language Model Adaption for Reinforcement Learning with Natural Language Action Space. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1620–1634, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Language Model Adaption for Reinforcement Learning with Natural Language Action Space (Wang et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.89.pdf