AffilGood: Building reliable institution name disambiguation tools to improve scientific literature analysis

Nicolau Duran-Silva, Pablo Accuosto, Piotr Przybyła, Horacio Saggion


Abstract
The accurate attribution of scientific works to research organizations is hindered by the lack of openly available manually annotated data–in particular when multilingual and complex affiliation strings are considered. The AffilGood framework introduced in this paper addresses this gap. We identify three sub-tasks relevant for institution name disambiguation and make available annotated datasets and tools aimed at each of them, including i) a dataset annotated with affiliation spans in noisy automatically-extracted strings; ii) a dataset annotated with named entities for the identification of organizations and their locations; iii) seven datasets annotated with the Research Organization Registry (ROR) identifiers for the evaluation of entity-linking systems. In addition, we describe, evaluate and make available newly developed tools that use these datasets to provide solutions for each of the identified sub-tasks. Our results confirm the value of the developed resources and methods in addressing key challenges in institution name disambiguation.
Anthology ID:
2024.sdp-1.13
Volume:
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Tirthankar Ghosal, Amanpreet Singh, Anita Waard, Philipp Mayr, Aakanksha Naik, Orion Weller, Yoonjoo Lee, Shannon Shen, Yanxia Qin
Venues:
sdp | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
135–144
Language:
URL:
https://aclanthology.org/2024.sdp-1.13
DOI:
Bibkey:
Cite (ACL):
Nicolau Duran-Silva, Pablo Accuosto, Piotr Przybyła, and Horacio Saggion. 2024. AffilGood: Building reliable institution name disambiguation tools to improve scientific literature analysis. In Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024), pages 135–144, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
AffilGood: Building reliable institution name disambiguation tools to improve scientific literature analysis (Duran-Silva et al., sdp-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sdp-1.13.pdf