Citations Beyond Self Citations: Identifying Authors, Affiliations, and Nationalities in Scientific Papers

Yoshitomo Matsubara, Sameer Singh


Abstract
The question of the utility of the blind peer-review system is fundamental to scientific research. Some studies investigate exactly how “blind” the papers are in the double-blind review system by manually or automatically identifying the true authors, mainly suggesting the number of self-citations in the submitted manuscripts as the primary signal for identity. However, related work on the automated approaches are limited by the sizes of their datasets and the restricted experimental setup, thus they lack practical insights into the blind review process. In this work, we train models that identify the authors, their affiliations, and their nationalities through real-world, large-scale experiments on the Microsoft Academic Graph, including the cold start scenario. Our models are accurate; we identify at least one of authors, affiliations, and nationalities of held-out papers with 40.3%, 47.9% and 86.0% accuracy respectively, from the top-10 guesses of our models. However, through insights from the model, we demonstrate that these entities are identifiable with a small number of guesses primarily by using a combination of self-citations, social, and common citations. Moreover, our further analysis on the results leads to interesting findings, such as that prominent affiliations are easily identifiable (e.g. 93.8% of test papers written by Microsoft are identified with top-10 guesses). The experimental results show, against conventional belief, that the self-citations are no more informative than looking at the common citations, thus suggesting that removing self-citations is not sufficient for authors to maintain their anonymity.
Anthology ID:
2020.wosp-1.2
Volume:
Proceedings of the 8th International Workshop on Mining Scientific Publications
Month:
05 August
Year:
2020
Address:
Wuhan, China
Editors:
Petr Knoth, Christopher Stahl, Bikash Gyawali, David Pride, Suchetha N. Kunnath, Drahomira Herrmannova
Venue:
WOSP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9–20
Language:
URL:
https://aclanthology.org/2020.wosp-1.2
DOI:
Bibkey:
Cite (ACL):
Yoshitomo Matsubara and Sameer Singh. 2020. Citations Beyond Self Citations: Identifying Authors, Affiliations, and Nationalities in Scientific Papers. In Proceedings of the 8th International Workshop on Mining Scientific Publications, pages 9–20, Wuhan, China. Association for Computational Linguistics.
Cite (Informal):
Citations Beyond Self Citations: Identifying Authors, Affiliations, and Nationalities in Scientific Papers (Matsubara & Singh, WOSP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wosp-1.2.pdf
Code
 yoshitomo-matsubara/guess-blind-entities
Data
Microsoft Academic Graph