SmartCiteCon: Implicit Citation Context Extraction from Academic Literature Using Supervised Learning

Chenrui Guo, Haoran Cui, Li Zhang, Jiamin Wang, Wei Lu, Jian Wu


Abstract
We introduce SmartCiteCon (SCC), a Java API for extracting both explicit and implicit citation context from academic literature in English. The tool is built on a Support Vector Machine (SVM) model trained on a set of 7,058 manually annotated citation context sentences, curated from 34,000 papers from the ACL Anthology. The model with 19 features achieves F1=85.6%. SCC supports PDF, XML, and JSON files out-of-box, provided that they are conformed to certain schemas. The API supports single document processing and batch processing in parallel. It takes about 12–45 seconds on average depending on the format to process a document on a dedicated server with 6 multithreaded cores. Using SCC, we extracted 11.8 million citation context sentences from ~33.3k PMC papers in the CORD-19 dataset, released on June 13, 2020. We will provide continuous supplementary data contribution to the CORD-19 and other datasets. The source code is released at https://gitee.com/irlab/SmartCiteCon.
Anthology ID:
2020.wosp-1.3
Volume:
Proceedings of the 8th International Workshop on Mining Scientific Publications
Month:
05 August
Year:
2020
Address:
Wuhan, China
Venue:
WOSP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21–26
Language:
URL:
https://aclanthology.org/2020.wosp-1.3
DOI:
Bibkey:
Cite (ACL):
Chenrui Guo, Haoran Cui, Li Zhang, Jiamin Wang, Wei Lu, and Jian Wu. 2020. SmartCiteCon: Implicit Citation Context Extraction from Academic Literature Using Supervised Learning. In Proceedings of the 8th International Workshop on Mining Scientific Publications, pages 21–26, Wuhan, China. Association for Computational Linguistics.
Cite (Informal):
SmartCiteCon: Implicit Citation Context Extraction from Academic Literature Using Supervised Learning (Guo et al., WOSP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wosp-1.3.pdf
Data
S2ORC