Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors

Andrea Pedrotti; Michele Papucci; Cristiano Ciaccio; Alessio Miaschi; Giovanni Puccetti; Felice Dell’Orletta; Andrea Esuli

doi:10.18653/v1/2025.findings-acl.156

Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors

Andrea Pedrotti, Michele Papucci, Cristiano Ciaccio, Alessio Miaschi, Giovanni Puccetti, Felice Dell’Orletta, Andrea Esuli

Abstract

Recent advancements in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, raising concerns about the potential for malicious use, such as misinformation and manipulation. Moreover, detecting Machine-Generated Text (MGT) remains challenging due to the lack of robust benchmarks that assess generalization to real-world scenarios. In this work, we evaluate the resilience of state-of-the-art MGT detectors (e.g., Mage, Radar, LLM-DetectAIve) to linguistically informed adversarial attacks. We develop a pipeline that fine-tunes language models using Direct Preference Optimization (DPO) to shift the MGT style toward human-written text (HWT), obtaining generations more challenging to detect by current models. Additionally, we analyze the linguistic shifts induced by the alignment and how detectors rely on “linguistic shortcuts” to detect texts. Our results show that detectors can be easily fooled with relatively few examples, resulting in a significant drop in detecting performances. This highlights the importance of improving detection methods and making them robust to unseen in-domain texts. We release code, models, and data to support future research on more robust MGT detection benchmarks.

Anthology ID:: 2025.findings-acl.156
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3010–3031
Language:
URL:: https://aclanthology.org/2025.findings-acl.156/
DOI:: 10.18653/v1/2025.findings-acl.156
Bibkey:
Cite (ACL):: Andrea Pedrotti, Michele Papucci, Cristiano Ciaccio, Alessio Miaschi, Giovanni Puccetti, Felice Dell’Orletta, and Andrea Esuli. 2025. Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors. In Findings of the Association for Computational Linguistics: ACL 2025, pages 3010–3031, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors (Pedrotti et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.156.pdf

PDF Cite Search Fix data