The Visual Counter Turing Test (VCT²): A Benchmark for Evaluating AI-Generated Image Detection and the Visual AI Index (V_AI)

Nasrin Imanpour; Abhilekh Borah; Shashwat Bajpai; Subhankar Ghosh; Sainath Reddy Sankepally; Hasnat Md Abdullah; Nishoak Kosaraju; Shreyas Dixit; Ashhar Aziz; Shwetangshu Biswas; Vinija Jain; Aman Chadha; Song Wang; Amit P. Sheth; Amitava Das

The Visual Counter Turing Test (VCT²): A Benchmark for Evaluating AI-Generated Image Detection and the Visual AI Index (V_AI)

Nasrin Imanpour, Abhilekh Borah, Shashwat Bajpai, Subhankar Ghosh, Sainath Reddy Sankepally, Hasnat Md Abdullah, Nishoak Kosaraju, Shreyas Dixit, Ashhar Aziz, Shwetangshu Biswas, Vinija Jain, Aman Chadha, Song Wang, Amit Sheth, Amitava Das

Abstract

The rapid progress and widespread availability of text-to-image (T2I) generation models have heightened concerns about the misuse of AI-generated visuals, particularly in the context of misinformation campaigns. Existing AI-generated image detection (AGID) methods often overfit to known generators and falter on outputs from newer or unseen models. To systematically address this generalization gap, we introduce the Visual Counter Turing Test (VCT^2), a comprehensive benchmark of 166,000 images, comprising both real and synthetic prompt-image pairs produced by six state-of-the-art (SoTA) T2I systems: Stable Diffusion 2.1, SDXL, SD3 Medium, SD3.5 Large, DALL·E 3, and Midjourney 6. We curate two distinct subsets: COCO_AI, featuring structured captions from MS COCO, and Twitter_AI, containing narrative-style tweets from The New York Times. Under a unified zero-shot evaluation, we benchmark 17 leading AGID models and observe alarmingly low detection accuracy, 58% on COCO_AI and 58.34% on Twitter_AI. To transcend binary classification, we propose the Visual AI Index (V_AI), an interpretable, prompt-agnostic realism metric based on twelve low-level visual features, enabling us to quantify and rank the perceptual quality of generated outputs with greater nuance. Correlation analysis reveals a moderate inverse relationship between V_AI and detection accuracy: Pearson rho of -0.532 on COCO_AI and rho of -0.503 on Twitter_AI; suggesting that more visually realistic images tend to be harder to detect, a trend observed consistently across generators. We release COCO_AI and Twitter_AI to catalyze future advances in robust AGID and perceptual realism assessment.

Anthology ID:: 2025.ijcnlp-long.100
Volume:: Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venues:: IJCNLP | AACL
SIG:
Publisher:: The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:: 1847–1862
Language:
URL:: https://aclanthology.org/2025.ijcnlp-long.100/
DOI:
Bibkey:
Cite (ACL):: Nasrin Imanpour, Abhilekh Borah, Shashwat Bajpai, Subhankar Ghosh, Sainath Reddy Sankepally, Hasnat Md Abdullah, Nishoak Kosaraju, Shreyas Dixit, Ashhar Aziz, Shwetangshu Biswas, Vinija Jain, Aman Chadha, Song Wang, Amit Sheth, and Amitava Das. 2025. The Visual Counter Turing Test (VCT²): A Benchmark for Evaluating AI-Generated Image Detection and the Visual AI Index (V_AI). In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 1847–1862, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):: The Visual Counter Turing Test (VCT²): A Benchmark for Evaluating AI-Generated Image Detection and the Visual AI Index (V_AI) (Imanpour et al., IJCNLP-AACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ijcnlp-long.100.pdf

PDF Cite Search Fix data