ZELDA: A Comprehensive Benchmark for Supervised Entity Disambiguation

Marcel Milich, Alan Akbik


Abstract
Entity disambiguation (ED) is the task of disambiguating named entity mentions in text to unique entries in a knowledge base. Due to its industrial relevance, as well as current progress in leveraging pre-trained language models, a multitude of ED approaches have been proposed in recent years. However, we observe a severe lack of uniformity across experimental setups in current ED work,rendering a direct comparison of approaches based solely on reported numbers impossible: Current approaches widely differ in the data set used to train, the size of the covered entity vocabulary, and the usage of additional signals such as candidate lists. To address this issue, we present ZELDA , a novel entity disambiguation benchmark that includes a unified training data set, entity vocabulary, candidate lists, as well as challenging evaluation splits covering 8 different domains. We illustrate its design and construction, and present experiments in which we train and compare current state-of-the-art approaches on our benchmark. To encourage greater direct comparability in the entity disambiguation domain, we make our benchmark publicly available to the research community.
Anthology ID:
2023.eacl-main.151
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2061–2072
Language:
URL:
https://aclanthology.org/2023.eacl-main.151
DOI:
10.18653/v1/2023.eacl-main.151
Award:
 EACL Outstanding Paper
Bibkey:
Cite (ACL):
Marcel Milich and Alan Akbik. 2023. ZELDA: A Comprehensive Benchmark for Supervised Entity Disambiguation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2061–2072, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
ZELDA: A Comprehensive Benchmark for Supervised Entity Disambiguation (Milich & Akbik, EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.151.pdf
Video:
 https://aclanthology.org/2023.eacl-main.151.mp4