Strong Heuristics for Named Entity Linking

Marko Čuljak, Andreas Spitz, Robert West, Akhil Arora


Abstract
Named entity linking (NEL) in news is a challenging endeavour due to the frequency of unseen and emerging entities, which necessitates the use of unsupervised or zero-shot methods. However, such methods tend to come with caveats, such as no integration of suitable knowledge bases (like Wikidata) for emerging entities, a lack of scalability, and poor interpretability. Here, we consider person disambiguation in Quotebank, a massive corpus of speaker-attributed quotations from the news, and investigate the suitability of intuitive, lightweight, and scalable heuristics for NEL in web-scale corpora. Our best performing heuristic disambiguates 94% and 63% of the mentions on Quotebank and the AIDA-CoNLL benchmark, respectively. Additionally, the proposed heuristics compare favourably to the state-of-the-art unsupervised and zero-shot methods, Eigenthemes and mGENRE, respectively, thereby serving as strong baselines for unsupervised and zero-shot entity linking.
Anthology ID:
2022.naacl-srw.30
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Month:
July
Year:
2022
Address:
Hybrid: Seattle, Washington + Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
235–246
Language:
URL:
https://aclanthology.org/2022.naacl-srw.30
DOI:
10.18653/v1/2022.naacl-srw.30
Bibkey:
Cite (ACL):
Marko Čuljak, Andreas Spitz, Robert West, and Akhil Arora. 2022. Strong Heuristics for Named Entity Linking. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 235–246, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
Cite (Informal):
Strong Heuristics for Named Entity Linking (Čuljak et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-srw.30.pdf
Code
 epfl-dlab/nelight
Data
AIDA CoNLL-YAGOCoNLL-2003