Adaptive Dialog Policy Learning with Hindsight and User Modeling

Yan Cao, Keting Lu, Xiaoping Chen, Shiqi Zhang


Abstract
Reinforcement learning (RL) methods have been widely used for learning dialog policies. Sample efficiency, i.e., the efficiency of learning from limited dialog experience, is particularly important in RL-based dialog policy learning, because interacting with people is costly and low-quality dialog policies produce very poor user experience. In this paper, we develop LHUA (Learning with Hindsight, User modeling, and Adaptation) that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users. Simulation and hindsight provide the dialog agent with more experience and more (positive) reinforcement respectively. Experimental results suggest that LHUA outperforms competitive baselines from the literature, including its no-simulation, no-adaptation, and no-hindsight counterparts.
Anthology ID:
2020.sigdial-1.40
Volume:
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:
July
Year:
2020
Address:
1st virtual meeting
Editors:
Olivier Pietquin, Smaranda Muresan, Vivian Chen, Casey Kennington, David Vandyke, Nina Dethlefs, Koji Inoue, Erik Ekstedt, Stefan Ultes
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
329–338
Language:
URL:
https://aclanthology.org/2020.sigdial-1.40
DOI:
10.18653/v1/2020.sigdial-1.40
Bibkey:
Cite (ACL):
Yan Cao, Keting Lu, Xiaoping Chen, and Shiqi Zhang. 2020. Adaptive Dialog Policy Learning with Hindsight and User Modeling. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 329–338, 1st virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Adaptive Dialog Policy Learning with Hindsight and User Modeling (Cao et al., SIGDIAL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.sigdial-1.40.pdf
Video:
 https://youtube.com/watch?v=ZEXvT2F7UR4