User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning

Stefan Ultes, Wolfgang Maier


Abstract
Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work that is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we propose to use a reward signal based on user satisfaction. We propose a novel estimator and show that it outperforms all previous estimators while learning temporal dependencies implicitly. We show in simulated experiments that a live user satisfaction estimation model may be applied resulting in higher estimated satisfaction whilst achieving similar success rates. Moreover, we show that a satisfaction estimation model trained on one domain may be applied in many other domains that cover a similar task. We verify our findings by employing the model to one of the domains for learning a policy from real users and compare its performance to policies using user satisfaction and task success acquired directly from the users as reward.
Anthology ID:
2021.dnd-12.5
Volume:
Dialogue Discourse Volume 12
Month:
September
Year:
2021
Address:
Chicago, Illinois, USA
Editors:
Patrick Healey, Barbara Di Eugenio, Vera Demberg, Jonathan Ginzburg, Kallirroi Georgila, Amir Zeldes, Massimo Poesio
Venue:
DND
SIG:
SIGDIAL
Publisher:
University of Illinois Chicago
Note:
Pages:
81–114
Language:
URL:
https://aclanthology.org/2021.dnd-12.5/
DOI:
10.5210/dad.2021.203
Bibkey:
Cite (ACL):
Stefan Ultes and Wolfgang Maier. 2021. User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning. Dialogue & Discourse, 12:81–114.
Cite (Informal):
User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning (Ultes & Maier, DND 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.dnd-12.5.pdf