The Influence of the Perplexity Score in the Detection of Machine-generated Texts

Alberto José Gutiérrez Megías; L. Alfonso Urena Lopez; Eugenio Martínez-Cámara

The Influence of the Perplexity Score in the Detection of Machine-generated Texts

Alberto José Gutiérrez Megías, L. Alfonso Ureña-López, Eugenio Martínez Cámara

Abstract

The high performance of large language models (LLM) generating natural language represents a real threat, since they can be leveraged to generate any kind of deceptive content. Since there are still disparities among the language generated by machines and the human language, we claim that perplexity may be used as classification signal to discern between machine and human text. We propose a classification model based on XLM-RoBERTa, and we evaluate it on the M4 dataset. The results show that the perplexity score is useful for the identification of machine generated text, but it is constrained by the differences among the LLMs used in the training and test sets.

Anthology ID:: 2024.nlpaics-1.10
Volume:: Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Month:: July
Year:: 2024
Address:: Lancaster, UK
Editors:: Ruslan Mitkov, Saad Ezzini, Tharindu Ranasinghe, Ignatius Ezeani, Nouran Khallaf, Cengiz Acarturk, Matthew Bradbury, Mo El-Haj, Paul Rayson
Venue:: NLPAICS
SIG:
Publisher:: International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Note:
Pages:: 80–85
Language:
URL:: https://aclanthology.org/2024.nlpaics-1.10/
DOI:
Bibkey:
Cite (ACL):: Alberto José Gutiérrez Megías, L. Alfonso Ureña-López, and Eugenio Martínez Cámara. 2024. The Influence of the Perplexity Score in the Detection of Machine-generated Texts. In Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, pages 80–85, Lancaster, UK. International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security.
Cite (Informal):: The Influence of the Perplexity Score in the Detection of Machine-generated Texts (Gutiérrez Megías et al., NLPAICS 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.nlpaics-1.10.pdf

PDF Cite Search Fix data