To Combine or Not To Combine? A Rainbow Deep Reinforcement Learning Agent for Dialog Policies

Dirk Väth, Ngoc Thang Vu


Abstract
In this paper, we explore state-of-the-art deep reinforcement learning methods for dialog policy training such as prioritized experience replay, double deep Q-Networks, dueling network architectures and distributional learning. Our main findings show that each individual method improves the rewards and the task success rate but combining these methods in a Rainbow agent, which performs best across tasks and environments, is a non-trivial task. We, therefore, provide insights about the influence of each method on the combination and how to combine them to form a Rainbow agent.
Anthology ID:
W19-5908
Volume:
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
Month:
September
Year:
2019
Address:
Stockholm, Sweden
Editors:
Satoshi Nakamura, Milica Gasic, Ingrid Zukerman, Gabriel Skantze, Mikio Nakano, Alexandros Papangelis, Stefan Ultes, Koichiro Yoshino
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–67
Language:
URL:
https://aclanthology.org/W19-5908/
DOI:
10.18653/v1/W19-5908
Bibkey:
Cite (ACL):
Dirk Väth and Ngoc Thang Vu. 2019. To Combine or Not To Combine? A Rainbow Deep Reinforcement Learning Agent for Dialog Policies. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, pages 62–67, Stockholm, Sweden. Association for Computational Linguistics.
Cite (Informal):
To Combine or Not To Combine? A Rainbow Deep Reinforcement Learning Agent for Dialog Policies (Väth & Vu, SIGDIAL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5908.pdf