The impact of preprint servers in the formation of novel ideas

Swarup Satish, Zonghai Yao, Andrew Drozdov, Boris Veytsman


Abstract
We study whether novel ideas in biomedical literature appear first in preprints or traditional journals. We develop a Bayesian method to estimate the time of appearance for a phrase in the literature, and apply it to a number of phrases, both automatically extracted and suggested by experts. We see that presently most phrases appear first in the traditional journals, but there is a number of phrases with the first appearance on preprint servers. A comparison of the general composition of texts from bioRxiv and traditional journals shows a growing trend of bioRxiv being predictive of traditional journals. We discuss the application of the method for related problems.
Anthology ID:
2020.sdp-1.6
Volume:
Proceedings of the First Workshop on Scholarly Document Processing
Month:
November
Year:
2020
Address:
Online
Editors:
Muthu Kumar Chandrasekaran, Anita de Waard, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Eduard Hovy, Petr Knoth, David Konopnicki, Philipp Mayr, Robert M. Patton, Michal Shmueli-Scheuer
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42–55
Language:
URL:
https://aclanthology.org/2020.sdp-1.6
DOI:
10.18653/v1/2020.sdp-1.6
Bibkey:
Cite (ACL):
Swarup Satish, Zonghai Yao, Andrew Drozdov, and Boris Veytsman. 2020. The impact of preprint servers in the formation of novel ideas. In Proceedings of the First Workshop on Scholarly Document Processing, pages 42–55, Online. Association for Computational Linguistics.
Cite (Informal):
The impact of preprint servers in the formation of novel ideas (Satish et al., sdp 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.sdp-1.6.pdf
Video:
 https://slideslive.com/38940715
Code
 seasonyao/biorxivimpact