A Regional News Corpora for Contextualized Entity Discovery and Linking

Adrian Braşoveanu, Lyndon J.B. Nixon, Albert Weichselbraun, Arno Scharl


Abstract
This paper presents a German corpus for Named Entity Linking (NEL) and Knowledge Base Population (KBP) tasks. We describe the annotation guideline, the annotation process, NIL clustering techniques and conversion to popular NEL formats such as NIF and TAC that have been used to construct this corpus based on news transcripts from the German regional broadcaster RBB (Rundfunk Berlin Brandenburg). Since creating such language resources requires significant effort, the paper also discusses how to derive additional evaluation resources for tasks like named entity contextualization or ontology enrichment by exploiting the links between named entities from the annotated corpus. The paper concludes with an evaluation that shows how several well-known NEL tools perform on the corpus, a discussion of the evaluation results, and with suggestions on how to keep evaluation corpora and datasets up to date.
Anthology ID:
L16-1531
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3333–3338
Language:
URL:
https://aclanthology.org/L16-1531
DOI:
Bibkey:
Cite (ACL):
Adrian Braşoveanu, Lyndon J.B. Nixon, Albert Weichselbraun, and Arno Scharl. 2016. A Regional News Corpora for Contextualized Entity Discovery and Linking. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3333–3338, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Regional News Corpora for Contextualized Entity Discovery and Linking (Braşoveanu et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1531.pdf