The dbpedia R Package: An Integrated Workflow for Entity Linking (for ParlaMint Corpora)

Christoph Leonhardt, Andreas Blaette


Abstract
Entity Linking is a powerful approach for linking textual data to established structured data such as survey data or adminstrative data. However, in the realm of social science, the approach is not widely adopted. We argue that this is, at least in part, due to specific setup requirements which constitute high barriers for usage and workflows which are not well integrated into analyitical scenarios commonly deployed in social science research. We introduce the dbpedia R package to make the approach more accessible. It has a focus on functionality that is easily adoptable to the needs of social scientists working with textual data, including the support of different input formats, limited setup costs and various output formats. Using a ParlaMint corpus, we show the applicability and flexibility of the approach for parliamentary debates.
Anthology ID:
2024.parlaclarin-1.20
Volume:
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Darja Fiser, Maria Eskevich, David Bordon
Venues:
ParlaCLARIN | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
133–144
Language:
URL:
https://aclanthology.org/2024.parlaclarin-1.20
DOI:
Bibkey:
Cite (ACL):
Christoph Leonhardt and Andreas Blaette. 2024. The dbpedia R Package: An Integrated Workflow for Entity Linking (for ParlaMint Corpora). In Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024, pages 133–144, Torino, Italia. ELRA and ICCL.
Cite (Informal):
The dbpedia R Package: An Integrated Workflow for Entity Linking (for ParlaMint Corpora) (Leonhardt & Blaette, ParlaCLARIN-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.parlaclarin-1.20.pdf