Luis Miralles


2025

pdf bib
Benchmarking AI Text Detection: Assessing Detectors Against New Datasets, Evasion Tactics, and Enhanced LLMs
Shushanta Pudasaini | Luis Miralles | David Lillis | Marisa Llorens Salvador
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

The rapid advancement of Large Language Models (LLMs), such as GPT-4, has sparked concerns regarding academic misconduct, misinformation, and the erosion of originality. Despite the growing number of AI detection tools, their effectiveness is often undermined by sophisticated evasion tactics and the continuous evolution of LLMs. This research benchmarks the performance of leading AI detectors, including OpenAI Detector, RADAR, and ArguGPT, across a variety of text domains, evaded content, and text generated by cutting-edge LLMs. Our experiments reveal that current detection models show considerable unreliability in real-world scenarios, particularly when tested against diverse data domains and novel evasion strategies. The study underscores the need for enhanced robustness in detection systems and provides valuable insights into areas of improvement for these models. Additionally, this work lays the groundwork for future research by offering a comprehensive evaluation of existing detectors under challenging conditions, fostering a deeper understanding of their limitations. The experimental code and datasets are publicly available for further benchmarking on Github.