Blending Task Success and User Satisfaction: Analysis of Learned Dialogue Behaviour with Multiple Rewards

Stefan Ultes, Wolfgang Maier


Abstract
Recently, principal reward components for dialogue policy reinforcement learning use task success and user satisfaction independently and neither the resulting learned behaviour has been analysed nor a suitable proper analysis method even existed. In this work, we employ both principal reward components jointly and propose a method to analyse the resulting behaviour through a structured way of probing the learned policy. We show that blending both reward components increases user satisfaction without sacrificing task success in more hostile environments and provide insight about actions chosen by the learned policies.
Anthology ID:
2021.sigdial-1.42
Volume:
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:
July
Year:
2021
Address:
Singapore and Online
Editors:
Haizhou Li, Gina-Anne Levow, Zhou Yu, Chitralekha Gupta, Berrak Sisman, Siqi Cai, David Vandyke, Nina Dethlefs, Yan Wu, Junyi Jessy Li
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
403–410
Language:
URL:
https://aclanthology.org/2021.sigdial-1.42
DOI:
10.18653/v1/2021.sigdial-1.42
Bibkey:
Cite (ACL):
Stefan Ultes and Wolfgang Maier. 2021. Blending Task Success and User Satisfaction: Analysis of Learned Dialogue Behaviour with Multiple Rewards. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 403–410, Singapore and Online. Association for Computational Linguistics.
Cite (Informal):
Blending Task Success and User Satisfaction: Analysis of Learned Dialogue Behaviour with Multiple Rewards (Ultes & Maier, SIGDIAL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.sigdial-1.42.pdf
Video:
 https://www.youtube.com/watch?v=6US5hE70vRU
Data
MultiWOZ