Learning Dialog Policies from Weak Demonstrations

Gabriel Gordon-Hall, Philip John Gorinski, Shay B. Cohen


Abstract
Deep reinforcement learning is a promising approach to training a dialog manager, but current methods struggle with the large state and action spaces of multi-domain dialog systems. Building upon Deep Q-learning from Demonstrations (DQfD), an algorithm that scores highly in difficult Atari games, we leverage dialog data to guide the agent to successfully respond to a user’s requests. We make progressively fewer assumptions about the data needed, using labeled, reduced-labeled, and even unlabeled data to train expert demonstrators. We introduce Reinforced Fine-tune Learning, an extension to DQfD, enabling us to overcome the domain gap between the datasets and the environment. Experiments in a challenging multi-domain dialog system framework validate our approaches, and get high success rates even when trained on out-of-domain data.
Anthology ID:
2020.acl-main.129
Original:
2020.acl-main.129v1
Version 2:
2020.acl-main.129v2
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1394–1405
Language:
URL:
https://aclanthology.org/2020.acl-main.129
DOI:
10.18653/v1/2020.acl-main.129
Bibkey:
Cite (ACL):
Gabriel Gordon-Hall, Philip John Gorinski, and Shay B. Cohen. 2020. Learning Dialog Policies from Weak Demonstrations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1394–1405, Online. Association for Computational Linguistics.
Cite (Informal):
Learning Dialog Policies from Weak Demonstrations (Gordon-Hall et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.129.pdf
Video:
 http://slideslive.com/38928789
Data
MultiWOZ