Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning

Alexandros Papangelis, Yi-Chia Wang, Piero Molino, Gokhan Tur


Abstract
Some of the major challenges in training conversational agents include the lack of large-scale data of real-world complexity, defining appropriate evaluation measures, and managing meaningful conversations across many topics over long periods of time. Moreover, most works tend to assume that the conversational agent’s environment is stationary, a somewhat strong assumption. To remove this assumption and overcome the lack of data, we take a step away from the traditional training pipeline and model the conversation as a stochastic collaborative game. Each agent (player) has a role (“assistant”, “tourist”, “eater”, etc.) and their own objectives, and can only interact via language they generate. Each agent, therefore, needs to learn to operate optimally in an environment with multiple sources of uncertainty (its own LU and LG, the other agent’s LU, Policy, and LG). In this work, we present the first complete attempt at concurrently training conversational agents that communicate only via self-generated language and show that they outperform supervised and deep learning baselines.
Anthology ID:
W19-5912
Volume:
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
Month:
September
Year:
2019
Address:
Stockholm, Sweden
Venues:
SIGDIAL | WS
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
92–102
Language:
URL:
https://aclanthology.org/W19-5912
DOI:
10.18653/v1/W19-5912
Bibkey:
Cite (ACL):
Alexandros Papangelis, Yi-Chia Wang, Piero Molino, and Gokhan Tur. 2019. Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, pages 92–102, Stockholm, Sweden. Association for Computational Linguistics.
Cite (Informal):
Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning (Papangelis et al., 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5912.pdf
Code
 uber-research/plato-research-dialogue-system +  additional community code