An Efficient Dialogue Policy Agent with Model-Based Causal Reinforcement Learning

Kai Xu, Zhenyu Wang, Yangyang Zhao, Bopeng Fang


Abstract
Dialogue policy trains an agent to select dialogue actions frequently implemented via deep reinforcement learning (DRL). The model-based reinforcement methods built a world model to generate simulated data to alleviate the sample inefficiency. However, traditional world model methods merely consider one-step dialogues, leading to an inaccurate environmental simulation. Furthermore, different users may have different intention preferences, while most existing studies lack consideration of the intention-preferences causal relationship. This paper proposes a novel framework for dialogue policy learning named MCA, implemented through model-based reinforcement learning with automatically constructed causal chains. The MCA model utilizes an autoregressive Transformer to model dialogue trajectories, enabling a more accurate simulation of the environment. Additionally, it constructs a causal chains module that outputs latent preference distributions for intention-action pairs, thereby elucidating the relationship between user intentions and agent actions. The experimental results show that MCA can achieve state-of-the-art performances on three dialogue datasets over the compared dialogue agents, highlighting its effectiveness and robustness.
Anthology ID:
2025.coling-main.490
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7331–7343
Language:
URL:
https://aclanthology.org/2025.coling-main.490/
DOI:
Bibkey:
Cite (ACL):
Kai Xu, Zhenyu Wang, Yangyang Zhao, and Bopeng Fang. 2025. An Efficient Dialogue Policy Agent with Model-Based Causal Reinforcement Learning. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7331–7343, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
An Efficient Dialogue Policy Agent with Model-Based Causal Reinforcement Learning (Xu et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.490.pdf