%0 Conference Proceedings
%T Blending Task Success and User Satisfaction: Analysis of Learned Dialogue Behaviour with Multiple Rewards
%A Ultes, Stefan
%A Maier, Wolfgang
%Y Li, Haizhou
%Y Levow, Gina-Anne
%Y Yu, Zhou
%Y Gupta, Chitralekha
%Y Sisman, Berrak
%Y Cai, Siqi
%Y Vandyke, David
%Y Dethlefs, Nina
%Y Wu, Yan
%Y Li, Junyi Jessy
%S Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
%D 2021
%8 July
%I Association for Computational Linguistics
%C Singapore and Online
%F ultes-maier-2021-blending
%X Recently, principal reward components for dialogue policy reinforcement learning use task success and user satisfaction independently and neither the resulting learned behaviour has been analysed nor a suitable proper analysis method even existed. In this work, we employ both principal reward components jointly and propose a method to analyse the resulting behaviour through a structured way of probing the learned policy. We show that blending both reward components increases user satisfaction without sacrificing task success in more hostile environments and provide insight about actions chosen by the learned policies.
%R 10.18653/v1/2021.sigdial-1.42
%U https://aclanthology.org/2021.sigdial-1.42
%U https://doi.org/10.18653/v1/2021.sigdial-1.42
%P 403-410