Feedback Attribution for Counterfactual Bandit Learning in Multi-Domain Spoken Language Understanding

Tobias Falke; Patrick Lehnen

doi:10.18653/v1/2021.emnlp-main.91

Feedback Attribution for Counterfactual Bandit Learning in Multi-Domain Spoken Language Understanding

Abstract

With counterfactual bandit learning, models can be trained based on positive and negative feedback received for historical predictions, with no labeled data needed. Such feedback is often available in real-world dialog systems, however, the modularized architecture commonly used in large-scale systems prevents the direct application of such algorithms. In this paper, we study the feedback attribution problem that arises when using counterfactual bandit learning for multi-domain spoken language understanding. We introduce an experimental setup to simulate the problem on small-scale public datasets, propose attribution methods inspired by multi-agent reinforcement learning and evaluate them against multiple baselines. We find that while directly using overall feedback leads to disastrous performance, our proposed attribution methods can allow training competitive models from user feedback.

Anthology ID:: 2021.emnlp-main.91
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1190–1198
Language:
URL:: https://aclanthology.org/2021.emnlp-main.91/
DOI:: 10.18653/v1/2021.emnlp-main.91
Bibkey:
Cite (ACL):: Tobias Falke and Patrick Lehnen. 2021. Feedback Attribution for Counterfactual Bandit Learning in Multi-Domain Spoken Language Understanding. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1190–1198, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Feedback Attribution for Counterfactual Bandit Learning in Multi-Domain Spoken Language Understanding (Falke & Lehnen, EMNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.emnlp-main.91.pdf
Video:: https://aclanthology.org/2021.emnlp-main.91.mp4
Data: SNIPS

PDF Cite Search Video Fix data