Predicting the presence of inline citations in academic text using binary classification

Peter Vajdecka, Elena Callegari, Desara Xhura, Atli Ásmundsson


Abstract
Properly citing sources is a crucial component of any good-quality academic paper. The goal of this study was to determine what kind of accuracy we could reach in predicting whether or not a sentence should contain an inline citation using a simple binary classification model. To that end, we fine-tuned SciBERT on both an imbalanced and a balanced dataset containing sentences with and without inline citations. We achieved an overall accuracy of over 0.92, suggesting that language patterns alone could be used to predict where inline citations should appear with some degree of accuracy.
Anthology ID:
2023.nodalida-1.72
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
717–722
Language:
URL:
https://aclanthology.org/2023.nodalida-1.72
DOI:
Bibkey:
Cite (ACL):
Peter Vajdecka, Elena Callegari, Desara Xhura, and Atli Ásmundsson. 2023. Predicting the presence of inline citations in academic text using binary classification. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 717–722, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Predicting the presence of inline citations in academic text using binary classification (Vajdecka et al., NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.72.pdf