Frederic Kirstein


2022

pdf bib
How Large Language Models are Transforming Machine-Paraphrase Plagiarism
Jan Philip Wahle | Terry Ruas | Frederic Kirstein | Bela Gipp
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

The recent success of large language models for text generation poses a severe threat to academic integrity, as plagiarists can generate realistic paraphrases indistinguishable from original work. However, the role of large autoregressive models in generating machine-paraphrased plagiarism and their detection is still incipient in the literature. This work explores T5 and GPT3 for machine-paraphrase generation on scientific articles from arXiv, student theses, and Wikipedia. We evaluate the detection performance of six automated solutions and one commercial plagiarism detection software and perform a human study with 105 participants regarding their detection performance and the quality of generated examples. Our results suggest that large language models can rewrite text humans have difficulty identifying as machine-paraphrased (53% mean acc.). Human experts rate the quality of paraphrases generated by GPT-3 as high as original texts (clarity 4.0/5, fluency 4.2/5, coherence 3.8/5). The best-performing detection model (GPT-3) achieves 66% F1-score in detecting paraphrases. We make our code, data, and findings publicly available to facilitate the development of detection solutions.