Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution

David Q. Sun; Hadas Kotek; Christopher Klein; Mayank Gupta; William Li; Jason D. Williams

doi:10.18653/v1/2020.coling-main.316

Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution

David Q. Sun, Hadas Kotek, Christopher Klein, Mayank Gupta, William Li, Jason D. Williams

Abstract

This paper develops and implements a scalable methodology for (a) estimating the noisiness of labels produced by a typical crowdsourcing semantic annotation task, and (b) reducing the resulting error of the labeling process by as much as 20-30% in comparison to other common labeling strategies. Importantly, this new approach to the labeling process, which we name Dynamic Automatic Conflict Resolution (DACR), does not require a ground truth dataset and is instead based on inter-project annotation inconsistencies. This makes DACR not only more accurate but also available to a broad range of labeling tasks. In what follows we present results from a text classification task performed at scale for a commercial personal assistant, and evaluate the inherent ambiguity uncovered by this annotation strategy as compared to other common labeling strategies.

Anthology ID:: 2020.coling-main.316
Volume:: Proceedings of the 28th International Conference on Computational Linguistics
Month:: December
Year:: 2020
Address:: Barcelona, Spain (Online)
Editors:: Donia Scott, Nuria Bel, Chengqing Zong
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 3547–3557
Language:
URL:: https://aclanthology.org/2020.coling-main.316/
DOI:: 10.18653/v1/2020.coling-main.316
Bibkey:
Cite (ACL):: David Q. Sun, Hadas Kotek, Christopher Klein, Mayank Gupta, William Li, and Jason D. Williams. 2020. Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3547–3557, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):: Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution (Sun et al., COLING 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.coling-main.316.pdf

PDF Cite Search Fix data