Evaluating Ensemble Based Pre-annotation on Named Entity Corpus Construction in English and Chinese

Tingming Lu, Man Zhu, Zhiqiang Gao, Yaocheng Gui


Abstract
Annotated corpora are crucial language resources, and pre-annotation is an usual way to reduce the cost of corpus construction. Ensemble based pre-annotation approach combines multiple existing named entity taggers and categorizes annotations into normal annotations with high confidence and candidate annotations with low confidence, to reduce the human annotation time. In this paper, we manually annotate three English datasets under various pre-annotation conditions, report the effects of ensemble based pre-annotation, and analyze the experimental results. In order to verify the effectiveness of ensemble based pre-annotation in other languages, such as Chinese, three Chinese datasets are also tested. The experimental results show that the ensemble based pre-annotation approach significantly reduces the number of annotations which human annotators have to add, and outperforms the baseline approaches in reduction of human annotation time without loss in annotation performance (in terms of F1-measure), on both English and Chinese datasets.
Anthology ID:
W16-5208
Volume:
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yohei Murakami, Donghui Lin, Nancy Ide, James Pustejovsky
Venue:
OIAF4HLT
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
56–60
Language:
URL:
https://aclanthology.org/W16-5208
DOI:
Bibkey:
Cite (ACL):
Tingming Lu, Man Zhu, Zhiqiang Gao, and Yaocheng Gui. 2016. Evaluating Ensemble Based Pre-annotation on Named Entity Corpus Construction in English and Chinese. In Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016), pages 56–60, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Evaluating Ensemble Based Pre-annotation on Named Entity Corpus Construction in English and Chinese (Lu et al., OIAF4HLT 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-5208.pdf