GPT-4 is Judged More Human than Humans in Displaced and Inverted Turing Tests

Ishika M. Rathi; Sydney Taylor; Benjamin Bergen; Cameron Jones

GPT-4 is Judged More Human than Humans in Displaced and Inverted Turing Tests

Ishika M. Rathi, Sydney Taylor, Benjamin Bergen, Cameron Jones

Abstract

Everyday AI detection requires differentiating between humans and AI in informal, online conversations. At present, human users most often do not interact directly with bots but instead read their conversations with other humans. We measured how well humans and large language models can discriminate using two modified versions of the Turing test: inverted and displaced. GPT-3.5, GPT-4, and displaced human adjudicators judged whether an agent was human or AI on the basis of a Turing test transcript. We found that both AI and displaced human judges were less accurate than interactive interrogators, with below chance accuracy overall. Moreover, all three judged the best-performing GPT-4 witness to be human more often than human witnesses. This suggests that both humans and current LLMs struggle to distinguish between the two when they are not actively interrogating the person, underscoring an urgent need for more accurate tools to detect AI in conversations.

Anthology ID:: 2025.genaidetect-1.7
Volume:: Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Firoj Alam, Preslav Nakov, Nizar Habash, Iryna Gurevych, Shammur Chowdhury, Artem Shelmanov, Yuxia Wang, Ekaterina Artemova, Mucahid Kutlu, George Mikros
Venues:: GenAIDetect | WS
SIG:
Publisher:: International Conference on Computational Linguistics
Note:
Pages:: 96–110
Language:
URL:: https://aclanthology.org/2025.genaidetect-1.7/
DOI:
Bibkey:
Cite (ACL):: Ishika M. Rathi, Sydney Taylor, Benjamin Bergen, and Cameron Jones. 2025. GPT-4 is Judged More Human than Humans in Displaced and Inverted Turing Tests. In Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect), pages 96–110, Abu Dhabi, UAE. International Conference on Computational Linguistics.
Cite (Informal):: GPT-4 is Judged More Human than Humans in Displaced and Inverted Turing Tests (Rathi et al., GenAIDetect 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.genaidetect-1.7.pdf

PDF Cite Search Fix data