%0 Conference Proceedings
%T Feedback Attribution for Counterfactual Bandit Learning in Multi-Domain Spoken Language Understanding
%A Falke, Tobias
%A Lehnen, Patrick
%Y Moens, Marie-Francine
%Y Huang, Xuanjing
%Y Specia, Lucia
%Y Yih, Scott Wen-tau
%S Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
%D 2021
%8 November
%I Association for Computational Linguistics
%C Online and Punta Cana, Dominican Republic
%F falke-lehnen-2021-feedback
%X With counterfactual bandit learning, models can be trained based on positive and negative feedback received for historical predictions, with no labeled data needed. Such feedback is often available in real-world dialog systems, however, the modularized architecture commonly used in large-scale systems prevents the direct application of such algorithms. In this paper, we study the feedback attribution problem that arises when using counterfactual bandit learning for multi-domain spoken language understanding. We introduce an experimental setup to simulate the problem on small-scale public datasets, propose attribution methods inspired by multi-agent reinforcement learning and evaluate them against multiple baselines. We find that while directly using overall feedback leads to disastrous performance, our proposed attribution methods can allow training competitive models from user feedback.
%R 10.18653/v1/2021.emnlp-main.91
%U https://aclanthology.org/2021.emnlp-main.91
%U https://doi.org/10.18653/v1/2021.emnlp-main.91
%P 1190-1198