Blending Task Success and User Satisfaction: Analysis of Learned Dialogue Behaviour with Multiple Rewards

Stefan Ultes, Wolfgang Maier


Abstract
Recently, principal reward components for dialogue policy reinforcement learning use task success and user satisfaction independently and neither the resulting learned behaviour has been analysed nor a suitable proper analysis method even existed. In this work, we employ both principal reward components jointly and propose a method to analyse the resulting behaviour through a structured way of probing the learned policy. We show that blending both reward components increases user satisfaction without sacrificing task success in more hostile environments and provide insight about actions chosen by the learned policies.
Anthology ID:
2021.sigdial-1.42
Volume:
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:
July
Year:
2021
Address:
Singapore and Online
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
403–410
Language:
URL:
https://aclanthology.org/2021.sigdial-1.42
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.sigdial-1.42.pdf