CNLP-NITS-PP at GenAI Detection Task 2: Leveraging DistilBERT and XLM-RoBERTa for Multilingual AI-Generated Text Detection

Annepaka Yadagiri, Reddi Mohana Krishna, Partha Pakray


Abstract
In today’s digital landscape, distinguishing between human-authored essays and content generated by advanced Large Language Models such as ChatGPT, GPT-4, Gemini, and LLaMa has become increasingly complex. This differentiation is essential across sectors like academia, cybersecurity, social media, and education, where the authenticity of written material is often crucial. Addressing this challenge, the COLING 2025 competition introduced Task 2, a binary classification task to separate AI-generated text from human-authored content. Using a benchmark dataset for English and Arabic, developing a methodology that fine-tuned various transformer-based neural networks, including CNN-LSTM, RNN, Bi-GRU, BERT, DistilBERT, GPT-2, and RoBERTa. Our Team CNLP-NITS-PP achieved competitive performance through meticulous hyperparameter optimization, reaching a Recall score of 0.825. Specifically, we ranked 18th in the English sub-task A with an accuracy of 0.77 and 20th in the Arabic sub-task B with an accuracy of 0.59. These results underscore the potential of transformer-based models in academic settings to detect AI-generated content effectively, laying a foundation for more advanced methods in essay authenticity verification.
Anthology ID:
2025.genaidetect-1.34
Volume:
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Firoj Alam, Preslav Nakov, Nizar Habash, Iryna Gurevych, Shammur Chowdhury, Artem Shelmanov, Yuxia Wang, Ekaterina Artemova, Mucahid Kutlu, George Mikros
Venues:
GenAIDetect | WS
SIG:
Publisher:
International Conference on Computational Linguistics
Note:
Pages:
307–311
Language:
URL:
https://aclanthology.org/2025.genaidetect-1.34/
DOI:
Bibkey:
Cite (ACL):
Annepaka Yadagiri, Reddi Mohana Krishna, and Partha Pakray. 2025. CNLP-NITS-PP at GenAI Detection Task 2: Leveraging DistilBERT and XLM-RoBERTa for Multilingual AI-Generated Text Detection. In Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect), pages 307–311, Abu Dhabi, UAE. International Conference on Computational Linguistics.
Cite (Informal):
CNLP-NITS-PP at GenAI Detection Task 2: Leveraging DistilBERT and XLM-RoBERTa for Multilingual AI-Generated Text Detection (Yadagiri et al., GenAIDetect 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.genaidetect-1.34.pdf