GPT-who: An Information Density-based Machine-Generated Text Detector

Saranya Venkatraman, Adaku Uchendu, Dongwon Lee


Abstract
The Uniform Information Density (UID) principle posits that humans prefer to spread information evenly during language production. We examine if this UID principle can help capture differences between Large Language Models (LLMs)-generated and human-generated texts. We propose GPT-who, the first psycholinguistically-inspired domain-agnostic statistical detector. This detector employs UID-based featuresto model the unique statistical signature of each LLM and human author for accurate detection. We evaluate our method using 4 large-scale benchmark datasets and find that GPT-who outperforms state-of-the-art detectors (both statistical- & non-statistical) such as GLTR, GPTZero, DetectGPT, OpenAI detector, and ZeroGPT by over 20% across domains.In addition to better performance, it is computationally inexpensive and utilizes an interpretable representation of text articles. We find that GPT-who can distinguish texts generated by very sophisticated LLMs, even when the overlying text is indiscernible.UID-based measures for all datasets and code are available at https://github.com/saranya-venkatraman/gpt-who.
Anthology ID:
2024.findings-naacl.8
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
103–115
Language:
URL:
https://aclanthology.org/2024.findings-naacl.8
DOI:
Bibkey:
Cite (ACL):
Saranya Venkatraman, Adaku Uchendu, and Dongwon Lee. 2024. GPT-who: An Information Density-based Machine-Generated Text Detector. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 103–115, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
GPT-who: An Information Density-based Machine-Generated Text Detector (Venkatraman et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.8.pdf
Copyright:
 2024.findings-naacl.8.copyright.pdf