Detecting AI-Generated Text with Pre-Trained Models Using Linguistic Features

Annepaka Yadagiri, Lavanya Shree, Suraiya Parween, Anushka Raj, Shreya Maurya, Partha Pakray


Abstract
The advent of sophisticated large language models, such as ChatGPT and other AI-driven platforms, has led to the generation of text that closely mimics human writing, making it increasingly challenging to discern whether it is human-generated or AI-generated content. This poses significant challenges to content verification, academic integrity, and detecting misleading information. To address these issues, we developed a classification system to differentiate between human-written and AI-generated texts using a diverse HC3-English dataset. This dataset leveraged linguistic analysis and structural features, including part-of-speech tags, vocabulary size, word density, active and passive voice usage, and readability metrics such as Flesch Reading Ease, perplexity, and burstiness. We employed transformer-based and deep-learning models for the classification task, such as CNN_BiLSTM, RNN, BERT, GPT-2, and RoBERTa. Among these, the RoBERTa model demonstrated superior performance, achieving an impressive accuracy of 99.73. These outcomes demonstrate how cutting-edge deep learning methods can maintain information integrity in the digital realm.
Anthology ID:
2024.icon-1.21
Volume:
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2024
Address:
AU-KBC Research Centre, Chennai, India
Editors:
Sobha Lalitha Devi, Karunesh Arora
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
188–196
Language:
URL:
https://aclanthology.org/2024.icon-1.21/
DOI:
Bibkey:
Cite (ACL):
Annepaka Yadagiri, Lavanya Shree, Suraiya Parween, Anushka Raj, Shreya Maurya, and Partha Pakray. 2024. Detecting AI-Generated Text with Pre-Trained Models Using Linguistic Features. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 188–196, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
Cite (Informal):
Detecting AI-Generated Text with Pre-Trained Models Using Linguistic Features (Yadagiri et al., ICON 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.icon-1.21.pdf