Automatic Approach for Building Dataset of Citation Functions for COVID-19 Academic Papers

Setio Basuki, Masatoshi Tsuchiya


Abstract
This paper develops a new dataset of citation functions of COVID-19-related academic papers. Because the preparation of new labels of citation functions and building a new dataset requires much human effort and is time-consuming, this paper uses our previous citation functions that were built for the Computer Science (CS) domain, which consists of five coarse-grained labels and 21 fine-grained labels. This paper uses the COVID-19 Open Research Dataset (CORD-19) and extracts 99.6k random citing sentences from 10.1k papers. These citing sentences are categorized using the classification models built from the CS domain. The manually check on 475 random samples resulted accuracies of 76.6% and 70.2% on coarse-grained labels and fine-grained labels, respectively. The evaluation reveals three findings. First, two fine-grained labels experienced meaning shift while retaining the same idea. Second, the COVID-19 domain is dominated by statements highlighting the importance, cruciality, usefulness, benefit, consideration, etc. of certain topics for making sensible argumentation. Third, discussing State of The Arts (SOTA) in terms of their outperforming previous works in the COVID-19 domain is less popular compared to the CS domain. Our results will be used for further dataset development by classifying citing sentences in all papers from CORD-19.
Anthology ID:
2022.law-1.1
Volume:
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LAW
SIG:
SIGANN
Publisher:
European Language Resources Association
Note:
Pages:
1–7
Language:
URL:
https://aclanthology.org/2022.law-1.1
DOI:
Bibkey:
Cite (ACL):
Setio Basuki and Masatoshi Tsuchiya. 2022. Automatic Approach for Building Dataset of Citation Functions for COVID-19 Academic Papers. In Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022, pages 1–7, Marseille, France. European Language Resources Association.
Cite (Informal):
Automatic Approach for Building Dataset of Citation Functions for COVID-19 Academic Papers (Basuki & Tsuchiya, LAW 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.law-1.1.pdf
Data
CORD-19