Detecting Machine-Generated Long-Form Content with Latent-Space Variables

Yufei Tian, Zeyu Pan, Nanyun Peng


Abstract
The increasing capability of large language models (LLMs) to generate fluent long-form texts is presenting new challenges in distinguishing these outputs from those of humans. Existing zero-shot detectors that primarily focus on token-level distributions are vulnerable to real-world domain shift including different decoding strategies, variations in prompts, and attacks. We propose a more robust method that incorporates abstract elements—such as topic or event transitions—as key deciding factors, by training a latent-space model on sequences of events or topics derived from human-written texts. On three different domains, machine generations which are originally inseparable from humans’ on the token level can be better distinguished with our latent-space model, leading to a 31% improvement over strong baselines such as DetectGPT. Our analysis further reveals that unlike humans, modern LLMs such as GPT-4 selecting event triggers and transitions differently, and inherent disparity regardless of the generation configurations adopted in real-time.
Anthology ID:
2024.findings-emnlp.608
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10394–10408
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.608
DOI:
Bibkey:
Cite (ACL):
Yufei Tian, Zeyu Pan, and Nanyun Peng. 2024. Detecting Machine-Generated Long-Form Content with Latent-Space Variables. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10394–10408, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Detecting Machine-Generated Long-Form Content with Latent-Space Variables (Tian et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.608.pdf