Human vs. AI: A Novel Benchmark and a Comparative Study on the Detection of Generated Images and the Impact of Prompts

Philipp Moeßner, Heike Adel


Abstract
With the advent of publicly available AI-based text-to-image systems, the process of creating photorealistic but fully synthetic images has been largely democratized. This can pose a threat to the public through a simplified spread of disinformation. Machine detectors and human media expertise can help to differentiate between AI-generated (fake) and real images and counteract this danger. Although AI generation models are highly prompt-dependent, the impact of the prompt on the fake detection performance has rarely been investigated yet. This work therefore examines the influence of the prompt’s level of detail on the detectability of fake images, both with an AI detector and in a user study. For this purpose, we create a novel dataset, COCOXGEN, which consists of real photos from the COCO dataset as well as images generated with SDXL and Fooocus using prompts of two standardized lengths. Our user study with 200 participants shows that images generated with longer, more detailed prompts are detected significantly more easily than those generated with short prompts. Similarly, an AI-based detection model achieves better performance on images generated with longer prompts. However, humans and AI models seem to pay attention to different details, as we show in a heat map analysis.
Anthology ID:
2025.genaidetect-1.2
Volume:
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Firoj Alam, Preslav Nakov, Nizar Habash, Iryna Gurevych, Shammur Chowdhury, Artem Shelmanov, Yuxia Wang, Ekaterina Artemova, Mucahid Kutlu, George Mikros
Venues:
GenAIDetect | WS
SIG:
Publisher:
International Conference on Computational Linguistics
Note:
Pages:
47–58
Language:
URL:
https://aclanthology.org/2025.genaidetect-1.2/
DOI:
Bibkey:
Cite (ACL):
Philipp Moeßner and Heike Adel. 2025. Human vs. AI: A Novel Benchmark and a Comparative Study on the Detection of Generated Images and the Impact of Prompts. In Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect), pages 47–58, Abu Dhabi, UAE. International Conference on Computational Linguistics.
Cite (Informal):
Human vs. AI: A Novel Benchmark and a Comparative Study on the Detection of Generated Images and the Impact of Prompts (Moeßner & Adel, GenAIDetect 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.genaidetect-1.2.pdf