Does GPT-4 pass the Turing test?

Cameron Jones, Ben Bergen


Abstract
We evaluated GPT-4 in a public online Turing test. The best-performing GPT-4 prompt passed in 49.7% of games, outperforming ELIZA (22%) and GPT-3.5 (20%), but falling short of the baseline set by human participants (66%). Participants’ decisions were based mainly on linguistic style (35%) and socioemotional traits (27%), supporting the idea that intelligence, narrowly conceived, is not sufficient to pass the Turing test. Participant knowledge about LLMs and number of games played positively correlated with accuracy in detecting AI, suggesting learning and practice as possible strategies to mitigate deception. Despite known limitations as a test of intelligence, we argue that the Turing test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness.
Anthology ID:
2024.naacl-long.290
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5183–5210
Language:
URL:
https://aclanthology.org/2024.naacl-long.290
DOI:
10.18653/v1/2024.naacl-long.290
Bibkey:
Cite (ACL):
Cameron Jones and Ben Bergen. 2024. Does GPT-4 pass the Turing test?. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5183–5210, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Does GPT-4 pass the Turing test? (Jones & Bergen, NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.290.pdf