Human vs Machine: An Automated Machine-Generated Text Detection Approach

Urwah Jawaid, Rudra Roy, Pritam Pal, Srijani Debnath, Dipankar Das, Sivaji Bandyopadhyay


Abstract
With the advancement of natural language processing (NLP) and sophisticated Large Language Models (LLMs), distinguishing between human-written texts and machine-generated texts is quite difficult nowadays. This paper presents a systematic approach to classifying machine-generated text from human-written text with a combination of the transformer-based model and textual feature-based post-processing technique. We extracted five textual features: readability score, stop word score, spelling and grammatical error count, unique word score and human phrase count from both human-written and machine-generated texts separately and trained three machine learning models (SVM, Random Forest and XGBoost) with these scores. Along with exploring traditional machine-learning models, we explored the BiLSTM and transformer-based distilBERT models to enhance the classification performance. By training and evaluating with a large dataset containing both human-written and machine-generated text, our best-performing framework achieves an accuracy of 87.5%.
Anthology ID:
2024.icon-1.24
Volume:
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2024
Address:
AU-KBC Research Centre, Chennai, India
Editors:
Sobha Lalitha Devi, Karunesh Arora
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
215–223
Language:
URL:
https://aclanthology.org/2024.icon-1.24/
DOI:
Bibkey:
Cite (ACL):
Urwah Jawaid, Rudra Roy, Pritam Pal, Srijani Debnath, Dipankar Das, and Sivaji Bandyopadhyay. 2024. Human vs Machine: An Automated Machine-Generated Text Detection Approach. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 215–223, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
Cite (Informal):
Human vs Machine: An Automated Machine-Generated Text Detection Approach (Jawaid et al., ICON 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.icon-1.24.pdf