Zhiqiang Gao


pdf bib
Evaluating Ensemble Based Pre-annotation on Named Entity Corpus Construction in English and Chinese
Tingming Lu | Man Zhu | Zhiqiang Gao | Yaocheng Gui
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)

Annotated corpora are crucial language resources, and pre-annotation is an usual way to reduce the cost of corpus construction. Ensemble based pre-annotation approach combines multiple existing named entity taggers and categorizes annotations into normal annotations with high confidence and candidate annotations with low confidence, to reduce the human annotation time. In this paper, we manually annotate three English datasets under various pre-annotation conditions, report the effects of ensemble based pre-annotation, and analyze the experimental results. In order to verify the effectiveness of ensemble based pre-annotation in other languages, such as Chinese, three Chinese datasets are also tested. The experimental results show that the ensemble based pre-annotation approach significantly reduces the number of annotations which human annotators have to add, and outperforms the baseline approaches in reduction of human annotation time without loss in annotation performance (in terms of F1-measure), on both English and Chinese datasets.