Christof Bless
2025
Analyzing the Evolution of Scientific Misconduct Based on the Language of Retracted Papers
Christof Bless | Andreas Waldis | Angelina Parfenova | Maria A. Rodriguez | Andreas Marfurt
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Christof Bless | Andreas Waldis | Angelina Parfenova | Maria A. Rodriguez | Andreas Marfurt
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Amid rising numbers of organizations producing counterfeit scholarly articles, it is important to quantify the prevalence of scientific misconduct.We assess the feasibility of automated text-based methods to determine the rate of scientific misconduct by analyzing linguistic differences between retracted and non-retracted papers.We find that retracted works show distinct phrase patterns and higher word repetition.Motivated by this, we evaluatetwo misconduct detection methods, a mixture distribution approach and a Transformer-based one.The best models achieve high accuracy (>0.9 F1) on detection of paper mill articles and automatically generated content, making them viable tools for flagging papers for closer review.We apply the classifiers to more than 300,000 paper abstracts, to quantify misconduct over time and find that our estimation methods accurately reproduce trends observed in the real data.