Enriching the Metadata of Community-Generated Digital Content through Entity Linking: An Evaluative Comparison of State-of-the-Art Models

Youcef Benkhedda, Adrians Skapars, Viktor Schlegel, Goran Nenadic, Riza Batista-Navarro


Abstract
Digital archive collections that have been contributed by communities, known as community-generated digital content (CGDC), are important sources of historical and cultural knowledge. However, CGDC items are not easily searchable due to semantic information being obscured within their textual metadata. In this paper, we investigate the extent to which state-of-the-art, general-domain entity linking (EL) models (i.e., BLINK, EPGEL and mGENRE) can map named entities mentioned in CGDC textual metadata, to Wikidata entities. We evaluate and compare their performance on an annotated dataset of CGDC textual metadata and provide some error analysis, in the way of informing future studies aimed at enriching CGDC metadata using entity linking methods.
Anthology ID:
2024.latechclfl-1.20
Volume:
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Stan Szpakowicz
Venues:
LaTeCHCLfL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
213–220
Language:
URL:
https://aclanthology.org/2024.latechclfl-1.20
DOI:
Bibkey:
Cite (ACL):
Youcef Benkhedda, Adrians Skapars, Viktor Schlegel, Goran Nenadic, and Riza Batista-Navarro. 2024. Enriching the Metadata of Community-Generated Digital Content through Entity Linking: An Evaluative Comparison of State-of-the-Art Models. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), pages 213–220, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Enriching the Metadata of Community-Generated Digital Content through Entity Linking: An Evaluative Comparison of State-of-the-Art Models (Benkhedda et al., LaTeCHCLfL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.latechclfl-1.20.pdf
Video:
 https://aclanthology.org/2024.latechclfl-1.20.mp4