Analyzing the Evolution of Scientific Misconduct Based on the Language of Retracted Papers

Christof Bless; Andreas Waldis; Angelina Parfenova; Maria A. Rodriguez; Andreas Marfurt

doi:10.18653/v1/2025.sdp-1.6

Analyzing the Evolution of Scientific Misconduct Based on the Language of Retracted Papers

Christof Bless, Andreas Waldis, Angelina Parfenova, Maria A. Rodriguez, Andreas Marfurt

Abstract

Amid rising numbers of organizations producing counterfeit scholarly articles, it is important to quantify the prevalence of scientific misconduct.We assess the feasibility of automated text-based methods to determine the rate of scientific misconduct by analyzing linguistic differences between retracted and non-retracted papers.We find that retracted works show distinct phrase patterns and higher word repetition.Motivated by this, we evaluatetwo misconduct detection methods, a mixture distribution approach and a Transformer-based one.The best models achieve high accuracy (>0.9 F1) on detection of paper mill articles and automatically generated content, making them viable tools for flagging papers for closer review.We apply the classifiers to more than 300,000 paper abstracts, to quantify misconduct over time and find that our estimation methods accurately reproduce trends observed in the real data.

Anthology ID:: 2025.sdp-1.6
Volume:: Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Tirthankar Ghosal, Philipp Mayr, Amanpreet Singh, Aakanksha Naik, Georg Rehm, Dayne Freitag, Dan Li, Sonja Schimmler, Anita De Waard
Venues:: sdp | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 57–71
Language:
URL:: https://aclanthology.org/2025.sdp-1.6/
DOI:: 10.18653/v1/2025.sdp-1.6
Bibkey:
Cite (ACL):: Christof Bless, Andreas Waldis, Angelina Parfenova, Maria A. Rodriguez, and Andreas Marfurt. 2025. Analyzing the Evolution of Scientific Misconduct Based on the Language of Retracted Papers. In Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025), pages 57–71, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Analyzing the Evolution of Scientific Misconduct Based on the Language of Retracted Papers (Bless et al., sdp 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.sdp-1.6.pdf

PDF Cite Search Fix data