Accurate Training of Web-based Question Answering Systems with Feedback from Ranked Users

Liang Wang, Ivano Lauriola, Alessandro Moschitti


Abstract
Recent work has shown that large-scale annotated datasets are essential for training state-of-the-art Question Answering (QA) models. Unfortunately, creating this data is expensive and requires a huge amount of annotation work. An alternative and cheaper source of supervision is given by feedback data collected from deployed QA systems. This data can be collected from tens of millions of user with no additional cost, for real-world QA services, e.g., Alexa, Google Home, and etc. The main drawback is the noise affecting feedback on individual examples. Recent literature on QA systems has shown the benefit of training models even with noisy feedback. However, these studies have multiple limitations: (i) they used uniform random noise to simulate feedback responses, which is typically an unrealistic approximation as noise follows specific patterns, depending on target examples and users; and (ii) they do not show how to aggregate feedback for improving training signals. In this paper, we first collect a large scale (16M) QA dataset with real feedback sampled from the QA traffic of a popular Virtual Assistant.Second, we use this data to develop two strategies for filtering unreliable users and thus de-noise feedback: (i) ranking users with an automatic classifier, and (ii) aggregating feedback over similar instances and comparing users between each other. Finally, we train QA models on our filtered feedback data, showing a significant improvement over the state of the art.
Anthology ID:
2023.acl-industry.63
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Sunayana Sitaram, Beata Beigman Klebanov, Jason D Williams
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
660–667
Language:
URL:
https://aclanthology.org/2023.acl-industry.63
DOI:
10.18653/v1/2023.acl-industry.63
Bibkey:
Cite (ACL):
Liang Wang, Ivano Lauriola, and Alessandro Moschitti. 2023. Accurate Training of Web-based Question Answering Systems with Feedback from Ranked Users. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 660–667, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Accurate Training of Web-based Question Answering Systems with Feedback from Ranked Users (Wang et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-industry.63.pdf