Building a Synthetic Biomedical Research Article Citation Linkage Corpus

Sudipta Singha Roy, Robert E. Mercer


Abstract
Citations are frequently used in publications to support the presented results and to demonstrate the previous discoveries while also assisting the reader in following the chronological progression of information through publications. In scientific publications, a citation refers to the referenced document, but it makes no mention of the exact span of text that is being referred to. Connecting the citation to this span of text is called citation linkage. In this paper, to find these citation linkages in biomedical research publications using deep learning, we provide a synthetic silver standard corpus as well as the method to build this corpus. The motivation for building this corpus is to provide a training set for deep learning models that will locate the text spans in a reference article, given a citing statement, based on semantic similarity. This corpus is composed of sentence pairs, where one sentence in each pair is the citing statement and the other one is a candidate cited statement from the referenced paper. The corpus is annotated using an unsupervised sentence embedding method. The effectiveness of this silver standard corpus for training citation linkage models is validated against a human-annotated gold standard corpus.
Anthology ID:
2022.lrec-1.608
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5665–5672
Language:
URL:
https://aclanthology.org/2022.lrec-1.608
DOI:
Bibkey:
Cite (ACL):
Sudipta Singha Roy and Robert E. Mercer. 2022. Building a Synthetic Biomedical Research Article Citation Linkage Corpus. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5665–5672, Marseille, France. European Language Resources Association.
Cite (Informal):
Building a Synthetic Biomedical Research Article Citation Linkage Corpus (Singha Roy & Mercer, LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.608.pdf