Midas Loop: A Prioritized Human-in-the-Loop Annotation for Large Scale Multilayer Data

Luke Gessler, Lauren Levine, Amir Zeldes


Abstract
Large scale annotation of rich multilayer corpus data is expensive and time consuming, motivating approaches that integrate high quality automatic tools with active learning in order to prioritize human labeling of hard cases. A related challenge in such scenarios is the concurrent management of automatically annotated data and human annotated data, particularly where different subsets of the data have been corrected for different types of annotation and with different levels of confidence. In this paper we present [REDACTED], a collaborative, version-controlled online annotation environment for multilayer corpus data which includes integrated provenance and confidence metadata for each piece of information at the document, sentence, token and annotation level. We present a case study on improving annotation quality in an existing multilayer parse bank of English called AMALGUM, focusing on active learning in corpus preprocessing, at the surprisingly challenging level of sentence segmentation. Our results show improvements to state-of-the-art sentence segmentation and a promising workflow for getting “silver” data to approach gold standard quality.
Anthology ID:
2022.law-1.13
Volume:
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Sameer Pradhan, Sandra Kuebler
Venue:
LAW
SIG:
SIGANN
Publisher:
European Language Resources Association
Note:
Pages:
103–110
Language:
URL:
https://aclanthology.org/2022.law-1.13
DOI:
Bibkey:
Cite (ACL):
Luke Gessler, Lauren Levine, and Amir Zeldes. 2022. Midas Loop: A Prioritized Human-in-the-Loop Annotation for Large Scale Multilayer Data. In Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022, pages 103–110, Marseille, France. European Language Resources Association.
Cite (Informal):
Midas Loop: A Prioritized Human-in-the-Loop Annotation for Large Scale Multilayer Data (Gessler et al., LAW 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.law-1.13.pdf
Data
AMALGUM