Pablo Accuosto
2024
AffilGood: Building reliable institution name disambiguation tools to improve scientific literature analysis
Nicolau Duran-Silva
|
Pablo Accuosto
|
Piotr Przybyła
|
Horacio Saggion
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
The accurate attribution of scientific works to research organizations is hindered by the lack of openly available manually annotated data–in particular when multilingual and complex affiliation strings are considered. The AffilGood framework introduced in this paper addresses this gap. We identify three sub-tasks relevant for institution name disambiguation and make available annotated datasets and tools aimed at each of them, including i) a dataset annotated with affiliation spans in noisy automatically-extracted strings; ii) a dataset annotated with named entities for the identification of organizations and their locations; iii) seven datasets annotated with the Research Organization Registry (ROR) identifiers for the evaluation of entity-linking systems. In addition, we describe, evaluate and make available newly developed tools that use these datasets to provide solutions for each of the identified sub-tasks. Our results confirm the value of the developed resources and methods in addressing key challenges in institution name disambiguation.
2019
Transferring Knowledge from Discourse to Arguments: A Case Study with Scientific Abstracts
Pablo Accuosto
|
Horacio Saggion
Proceedings of the 6th Workshop on Argument Mining
In this work we propose to leverage resources available with discourse-level annotations to facilitate the identification of argumentative components and relations in scientific texts, which has been recognized as a particularly challenging task. In particular, we implement and evaluate a transfer learning approach in which contextualized representations learned from discourse parsing tasks are used as input of argument mining models. As a pilot application, we explore the feasibility of using automatically identified argumentative components and relations to predict the acceptance of papers in computer science venues. In order to conduct our experiments, we propose an annotation scheme for argumentative units and relations and use it to enrich an existing corpus with an argumentation layer.
Search