Adrian M. P. Brasoveanu
Also published as: Adrian M.P. Brasoveanu
2023
Orbis Annotator: An Open Source Toolkit for the Efficient Annotation and Refinement of Text
Norman Süsstrunk
|
Andreas Fraefel
|
Albert Weichselbraun
|
Adrian M. P. Brasoveanu
Proceedings of the 4th Conference on Language, Data and Knowledge
2020
In Media Res: A Corpus for Evaluating Named Entity Linking with Creative Works
Adrian M.P. Brasoveanu
|
Albert Weichselbraun
|
Lyndon Nixon
Proceedings of the 24th Conference on Computational Natural Language Learning
Annotation styles express guidelines that direct human annotators in what rules to follow when creating gold standard annotations of text corpora. These guidelines not only shape the gold standards they help create, but also influence the training and evaluation of Named Entity Linking (NEL) tools, since different annotation styles correspond to divergent views on the entities present in the same texts. Such divergence is particularly present in texts from the media domain that contain references to creative works. In this work we present a corpus of 1000 annotated documents selected from the media domain. Each document is presented with multiple gold standard annotations representing various annotation styles. This corpus is used to evaluate a series of Named Entity Linking tools in order to understand the impact of the differences in annotation styles on the reported accuracy when processing highly ambiguous entities such as names of creative works. Relaxed annotation guidelines that include overlap styles lead to better results across all tools.
2019
Improving Named Entity Linking Corpora Quality
Albert Weichselbraun
|
Adrian M.P. Brasoveanu
|
Philipp Kuntschik
|
Lyndon J.B. Nixon
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.
Search
Fix data
Co-authors
- Albert Weichselbraun 3
- Andreas Fraefel 1
- Philipp Kuntschik 1
- Lyndon Nixon 1
- Lyndon J.B. Nixon 1
- show all...