Benchmarking AI Text Detection: Assessing Detectors Against New Datasets, Evasion Tactics, and Enhanced LLMs

Shushanta Pudasaini; Luis Miralles; David Lillis; Marisa Llorens Salvador

Benchmarking AI Text Detection: Assessing Detectors Against New Datasets, Evasion Tactics, and Enhanced LLMs

Shushanta Pudasaini, Luis Miralles, David Lillis, Marisa Llorens Salvador

Abstract

The rapid advancement of Large Language Models (LLMs), such as GPT-4, has sparked concerns regarding academic misconduct, misinformation, and the erosion of originality. Despite the growing number of AI detection tools, their effectiveness is often undermined by sophisticated evasion tactics and the continuous evolution of LLMs. This research benchmarks the performance of leading AI detectors, including OpenAI Detector, RADAR, and ArguGPT, across a variety of text domains, evaded content, and text generated by cutting-edge LLMs. Our experiments reveal that current detection models show considerable unreliability in real-world scenarios, particularly when tested against diverse data domains and novel evasion strategies. The study underscores the need for enhanced robustness in detection systems and provides valuable insights into areas of improvement for these models. Additionally, this work lays the groundwork for future research by offering a comprehensive evaluation of existing detectors under challenging conditions, fostering a deeper understanding of their limitations. The experimental code and datasets are publicly available for further benchmarking on Github.

Anthology ID:: 2025.genaidetect-1.4
Volume:: Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Firoj Alam, Preslav Nakov, Nizar Habash, Iryna Gurevych, Shammur Chowdhury, Artem Shelmanov, Yuxia Wang, Ekaterina Artemova, Mucahid Kutlu, George Mikros
Venues:: GenAIDetect | WS
SIG:
Publisher:: International Conference on Computational Linguistics
Note:
Pages:: 68–77
Language:
URL:: https://aclanthology.org/2025.genaidetect-1.4/
DOI:
Bibkey:
Cite (ACL):: Shushanta Pudasaini, Luis Miralles, David Lillis, and Marisa Llorens Salvador. 2025. Benchmarking AI Text Detection: Assessing Detectors Against New Datasets, Evasion Tactics, and Enhanced LLMs. In Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect), pages 68–77, Abu Dhabi, UAE. International Conference on Computational Linguistics.
Cite (Informal):: Benchmarking AI Text Detection: Assessing Detectors Against New Datasets, Evasion Tactics, and Enhanced LLMs (Pudasaini et al., GenAIDetect 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.genaidetect-1.4.pdf
Optionalsupplementarymaterial:: 2025.genaidetect-1.4.OptionalSupplementaryMaterial.pdf

PDF Cite Search Optionalsupplementarymaterial Fix data