MAGE: Machine-generated Text Detection in the Wild

Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang


Abstract
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective deepfake text detection to mitigate risks like the spread of fake news and plagiarism. Existing research has been constrained by evaluating detection methods o specific domains or particular language models. In practical scenarios, however, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a comprehensive testbed by gathering texts from diverse human writings and deepfake texts generated by different LLMs. Empirical results on mainstream detection methods demonstrate the difficulties associated with detecting deepfake text in a wide-ranging testbed, particularly in out-of-distribution scenarios. Such difficulties align with the diminishing linguistic differences between the two text sources. Despite challenges, the top-performing detector can identify 84.12% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
Anthology ID:
2024.luhme-long.3
Original:
2024.acl-long.3v1
Version 2:
2024.acl-long.3v2
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36–53
Language:
URL:
https://aclanthology.org/2024.luhme-long.3/
DOI:
10.18653/v1/2024.acl-long.3
Bibkey:
Cite (ACL):
Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, and Yue Zhang. 2024. MAGE: Machine-generated Text Detection in the Wild. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 36–53, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
MAGE: Machine-generated Text Detection in the Wild (Li et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.3.pdf