Proactive Learning for Named Entity Recognition

Maolin Li, Nhung Nguyen, Sophia Ananiadou


Abstract
The goal of active learning is to minimise the cost of producing an annotated dataset, in which annotators are assumed to be perfect, i.e., they always choose the correct labels. However, in practice, annotators are not infallible, and they are likely to assign incorrect labels to some instances. Proactive learning is a generalisation of active learning that can model different kinds of annotators. Although proactive learning has been applied to certain labelling tasks, such as text classification, there is little work on its application to named entity (NE) tagging. In this paper, we propose a proactive learning method for producing NE annotated corpora, using two annotators with different levels of expertise, and who charge different amounts based on their levels of experience. To optimise both cost and annotation quality, we also propose a mechanism to present multiple sentences to annotators at each iteration. Experimental results for several corpora show that our method facilitates the construction of high-quality NE labelled datasets at minimal cost.
Anthology ID:
W17-2314
Volume:
BioNLP 2017
Month:
August
Year:
2017
Address:
Vancouver, Canada,
Editors:
Kevin Bretonnel Cohen, Dina Demner-Fushman, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
117–125
Language:
URL:
https://aclanthology.org/W17-2314
DOI:
10.18653/v1/W17-2314
Bibkey:
Cite (ACL):
Maolin Li, Nhung Nguyen, and Sophia Ananiadou. 2017. Proactive Learning for Named Entity Recognition. In BioNLP 2017, pages 117–125, Vancouver, Canada,. Association for Computational Linguistics.
Cite (Informal):
Proactive Learning for Named Entity Recognition (Li et al., BioNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2314.pdf
Data
GENIA