Metadata Generation for Research Data from URL Citation Contexts in Scholarly Papers: Task Definition and Dataset Construction

Yu Watanabe; Koichiro Ito; Shigeki Matsubara

Metadata Generation for Research Data from URL Citation Contexts in Scholarly Papers: Task Definition and Dataset Construction

Yu Watanabe, Koichiro Ito, Shigeki Matsubara

Abstract

This paper proposes a new research task aimed at automatically generating metadata for research data, such as datasets and code, to accelerate open science. From the perspective of ‘Findable’ in the FAIR data principles, research data is required to be assigned a global unique identifier and described with rich metadata. The proposed task is defined as extracting information about research data (specifically, name, generic mention, and in-text citation) from texts surrounding URLs that serve as identifiers for research data references in scholarly papers. To support this task, we constructed a dataset containing approximately 600 manually annotated citation contexts with URLs of research data from conference papers. To evaluate the task, we conducted a preliminary experiment using the constructed dataset, employing the In-Context Learning method with LLMs as a baseline. The results showed that the performance of LLMs matched that of humans in some cases, demonstrating the feasibility of the task.

Anthology ID:: 2025.wasp-main.8
Volume:: Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications
Month:: December
Year:: 2025
Address:: Mumbai, India and virtual
Editors:: Alberto Accomazzi, Tirthankar Ghosal, Felix Grezes, Kelly Lockhart
Venues:: WASP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 72–79
Language:
URL:: https://aclanthology.org/2025.wasp-main.8/
DOI:
Bibkey:
Cite (ACL):: Yu Watanabe, Koichiro Ito, and Shigeki Matsubara. 2025. Metadata Generation for Research Data from URL Citation Contexts in Scholarly Papers: Task Definition and Dataset Construction. In Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications, pages 72–79, Mumbai, India and virtual. Association for Computational Linguistics.
Cite (Informal):: Metadata Generation for Research Data from URL Citation Contexts in Scholarly Papers: Task Definition and Dataset Construction (Watanabe et al., WASP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.wasp-main.8.pdf

PDF Cite Search Fix data