Efficient Dialogue Complementary Policy Learning via Deep Q-network Policy and Episodic Memory Policy

Yangyang Zhao, Zhenyu Wang, Changxi Zhu, Shihan Wang


Abstract
Deep reinforcement learning has shown great potential in training dialogue policies. However, its favorable performance comes at the cost of many rounds of interaction. Most of the existing dialogue policy methods rely on a single learning system, while the human brain has two specialized learning and memory systems, supporting to find good solutions without requiring copious examples. Inspired by the human brain, this paper proposes a novel complementary policy learning (CPL) framework, which exploits the complementary advantages of the episodic memory (EM) policy and the deep Q-network (DQN) policy to achieve fast and effective dialogue policy learning. In order to coordinate between the two policies, we proposed a confidence controller to control the complementary time according to their relative efficacy at different stages. Furthermore, memory connectivity and time pruning are proposed to guarantee the flexible and adaptive generalization of the EM policy in dialog tasks. Experimental results on three dialogue datasets show that our method significantly outperforms existing methods relying on a single learning system.
Anthology ID:
2021.emnlp-main.354
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4311–4323
Language:
URL:
https://aclanthology.org/2021.emnlp-main.354
DOI:
10.18653/v1/2021.emnlp-main.354
Bibkey:
Cite (ACL):
Yangyang Zhao, Zhenyu Wang, Changxi Zhu, and Shihan Wang. 2021. Efficient Dialogue Complementary Policy Learning via Deep Q-network Policy and Episodic Memory Policy. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4311–4323, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Efficient Dialogue Complementary Policy Learning via Deep Q-network Policy and Episodic Memory Policy (Zhao et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.354.pdf