Lyndon J.B. Nixon


2019

pdf bib
Improving Named Entity Linking Corpora Quality
Albert Weichselbraun | Adrian M.P. Brasoveanu | Philipp Kuntschik | Lyndon J.B. Nixon
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.

2018

pdf bib
Framing Named Entity Linking Error Types
Adrian Braşoveanu | Giuseppe Rizzo | Philipp Kuntschik | Albert Weichselbraun | Lyndon J.B. Nixon
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
A Regional News Corpora for Contextualized Entity Discovery and Linking
Adrian Braşoveanu | Lyndon J.B. Nixon | Albert Weichselbraun | Arno Scharl
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a German corpus for Named Entity Linking (NEL) and Knowledge Base Population (KBP) tasks. We describe the annotation guideline, the annotation process, NIL clustering techniques and conversion to popular NEL formats such as NIF and TAC that have been used to construct this corpus based on news transcripts from the German regional broadcaster RBB (Rundfunk Berlin Brandenburg). Since creating such language resources requires significant effort, the paper also discusses how to derive additional evaluation resources for tasks like named entity contextualization or ontology enrichment by exploiting the links between named entities from the annotated corpus. The paper concludes with an evaluation that shows how several well-known NEL tools perform on the corpus, a discussion of the evaluation results, and with suggestions on how to keep evaluation corpora and datasets up to date.