Atli Ásmundsson


2023

pdf bib
Predicting the presence of inline citations in academic text using binary classification
Peter Vajdecka | Elena Callegari | Desara Xhura | Atli Ásmundsson
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Properly citing sources is a crucial component of any good-quality academic paper. The goal of this study was to determine what kind of accuracy we could reach in predicting whether or not a sentence should contain an inline citation using a simple binary classification model. To that end, we fine-tuned SciBERT on both an imbalanced and a balanced dataset containing sentences with and without inline citations. We achieved an overall accuracy of over 0.92, suggesting that language patterns alone could be used to predict where inline citations should appear with some degree of accuracy.