Task-Completion Dialogue Policy Learning via Monte Carlo Tree Search with Dueling Network

Sihan Wang, Kaijie Zhou, Kunfeng Lai, Jianping Shen


Abstract
We introduce a framework of Monte Carlo Tree Search with Double-q Dueling network (MCTS-DDU) for task-completion dialogue policy learning. Different from the previous deep model-based reinforcement learning methods, which uses background planning and may suffer from low-quality simulated experiences, MCTS-DDU performs decision-time planning based on dialogue state search trees built by Monte Carlo simulations and is robust to the simulation errors. Such idea arises naturally in human behaviors, e.g. predicting others’ responses and then deciding our own actions. In the simulated movie-ticket booking task, our method outperforms the background planning approaches significantly. We demonstrate the effectiveness of MCTS and the dueling network in detailed ablation studies, and also compare the performance upper bounds of these two planning methods.
Anthology ID:
2020.emnlp-main.278
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3461–3471
Language:
URL:
https://aclanthology.org/2020.emnlp-main.278
DOI:
10.18653/v1/2020.emnlp-main.278
Bibkey:
Cite (ACL):
Sihan Wang, Kaijie Zhou, Kunfeng Lai, and Jianping Shen. 2020. Task-Completion Dialogue Policy Learning via Monte Carlo Tree Search with Dueling Network. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3461–3471, Online. Association for Computational Linguistics.
Cite (Informal):
Task-Completion Dialogue Policy Learning via Monte Carlo Tree Search with Dueling Network (Wang et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.278.pdf
Video:
 https://slideslive.com/38938752