Learning a Cost-Effective Annotation Policy for Question Answering

Bernhard Kratzwald, Stefan Feuerriegel, Huan Sun


Abstract
State-of-the-art question answering (QA) relies upon large amounts of training data for which labeling is time consuming and thus expensive. For this reason, customizing QA systems is challenging. As a remedy, we propose a novel framework for annotating QA datasets that entails learning a cost-effective annotation policy and a semi-supervised annotation scheme. The latter reduces the human effort: it leverages the underlying QA system to suggest potential candidate annotations. Human annotators then simply provide binary feedback on these candidates. Our system is designed such that past annotations continuously improve the future performance and thus overall annotation cost. To the best of our knowledge, this is the first paper to address the problem of annotating questions with minimal annotation cost. We compare our framework against traditional manual annotations in an extensive set of experiments. We find that our approach can reduce up to 21.1% of the annotation cost.
Anthology ID:
2020.emnlp-main.246
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3051–3062
Language:
URL:
https://aclanthology.org/2020.emnlp-main.246
DOI:
10.18653/v1/2020.emnlp-main.246
Bibkey:
Cite (ACL):
Bernhard Kratzwald, Stefan Feuerriegel, and Huan Sun. 2020. Learning a Cost-Effective Annotation Policy for Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3051–3062, Online. Association for Computational Linguistics.
Cite (Informal):
Learning a Cost-Effective Annotation Policy for Question Answering (Kratzwald et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.246.pdf
Code
 bernhard2202/qa-annotation
Data
Natural QuestionsSQuAD