Which Examples Should be Multiply Annotated? Active Learning When Annotators May Disagree

Connor Baumler, Anna Sotnikova, Hal Daumé III


Abstract
Linguistic annotations, especially for controversial topics like hate speech detection, are frequently contested due to annotator backgrounds and positionalities. In such situations, preserving this disagreement through the machine learning pipeline can be important for downstream use cases. However, capturing disagreement can increase annotation time and expense. Fortunately, for many tasks, not all examples are equally controversial; we develop an active learning approach, Disagreement Aware Active Learning (DAAL) that concentrates annotations on examples where model entropy and annotator entropy are the most different. Because we cannot know the true entropy of annotations on unlabeled examples, we estimate a model that predicts annotator entropy trained using very few multiply-labeled examples. We find that traditional uncertainty-based active learning underperforms simple passive learning on tasks with high levels of disagreement, but that our active learning approach is able to successfully improve on passive and active baselines, reducing the number of annotations required by at least 24% on average across several datasets.
Anthology ID:
2023.findings-acl.658
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10352–10371
Language:
URL:
https://aclanthology.org/2023.findings-acl.658
DOI:
10.18653/v1/2023.findings-acl.658
Bibkey:
Cite (ACL):
Connor Baumler, Anna Sotnikova, and Hal Daumé III. 2023. Which Examples Should be Multiply Annotated? Active Learning When Annotators May Disagree. In Findings of the Association for Computational Linguistics: ACL 2023, pages 10352–10371, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Which Examples Should be Multiply Annotated? Active Learning When Annotators May Disagree (Baumler et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.658.pdf
Video:
 https://aclanthology.org/2023.findings-acl.658.mp4