Effort Estimation in Named Entity Tagging Tasks

Inês Gomes, Rui Correia, Jorge Ribeiro, João Freitas


Abstract
Named Entity Recognition (NER) is an essential component of many Natural Language Processing pipelines. However, building these language dependent models requires large amounts of annotated data. Crowdsourcing emerged as a scalable solution to collect and enrich data in a more time-efficient manner. To manage these annotations at scale, it is important to predict completion timelines and compute fair pricing for workers in advance. To achieve these goals, we need to know how much effort will be taken to complete each task. In this paper, we investigate which variables influence the time spent on a named entity annotation task by a human. Our results are two-fold: first, the understanding of the effort-impacting factors which we divided into cognitive load and input length; and second, the performance of the prediction itself. On the latter, through model adaptation and feature engineering, we attained a Root Mean Squared Error (RMSE) of 25.68 words per minute with a Nearest Neighbors model.
Anthology ID:
2020.lrec-1.37
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
298–306
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.37
DOI:
Bibkey:
Cite (ACL):
Inês Gomes, Rui Correia, Jorge Ribeiro, and João Freitas. 2020. Effort Estimation in Named Entity Tagging Tasks. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 298–306, Marseille, France. European Language Resources Association.
Cite (Informal):
Effort Estimation in Named Entity Tagging Tasks (Gomes et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.37.pdf