CUET_Absolute_Zero@DravidianLangTech 2025: Detecting AI-Generated Product Reviews in Malayalam and Tamil Language Using Transformer Models

Anindo Barua; Sidratul Muntaha; Momtazul Arefin Labib; Samia Rahman; Udoy Das; Hasan Murad

doi:10.18653/v1/2025.dravidianlangtech-1.71

CUET_Absolute_Zero@DravidianLangTech 2025: Detecting AI-Generated Product Reviews in Malayalam and Tamil Language Using Transformer Models

Anindo Barua, Sidratul Muntaha, Momtazul Arefin Labib, Samia Rahman, Udoy Das, Hasan Murad

Abstract

Artificial Intelligence (AI) is opening new doors of learning and interaction. However, it has its share of problems. One major issue is the ability of AI to generate text that resembles human-written text. So, how can we tell apart human-written text from AI-generated text?With this in mind, we have worked on detecting AI-generated product reviews in Dravidian languages, mainly in Malayalam and Tamil. The “Shared Task on Detecting AI-Generated Product Reviews in Dravidian Languages,” held as part of the DravidianLangTech Workshop at NAACL 2025 has provided a dataset categorized into two categories, human-written review and AI-generated review. We have implemented four machine learning models (Random Forest, Support Vector Machine, Decision Tree, and XGBoost), four deep learning models (Long Short-Term Memory, Bidirectional Long Short-Term Memory, Gated Recurrent Unit, and Recurrent Neural Network), and three transformer-based models (AI-Human-Detector, Detect-AI-Text, and E5-Small-Lora-AI-Generated-Detector). We have conducted a comparative study among all the models by training and evaluating each model on the dataset. We have discovered that the transformer, E5-Small-Lora-AI-Generated-Detector, has provided the best result with an F1 score of 0.8994 on the test set ranking 7th position in the Malayalam language. Tamil has a higher token overlap and richer morphology than Malayalam. Thus, we obtained a worse F1 score of 0.5877 ranking 28th position in the Tamil language among all participants in the shared task.

Anthology ID:: 2025.dravidianlangtech-1.71
Volume:: Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:: May
Year:: 2025
Address:: Acoma, The Albuquerque Convention Center, Albuquerque, New Mexico
Editors:: Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Elizabeth Sherly, Saranya Rajiakodi, Balasubramanian Palani, Malliga Subramanian, Subalalitha Cn, Dhivya Chinnappa
Venues:: DravidianLangTech | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 398–403
Language:
URL:: https://aclanthology.org/2025.dravidianlangtech-1.71/
DOI:: 10.18653/v1/2025.dravidianlangtech-1.71
Bibkey:
Cite (ACL):: Anindo Barua, Sidratul Muntaha, Momtazul Arefin Labib, Samia Rahman, Udoy Das, and Hasan Murad. 2025. CUET_Absolute_Zero@DravidianLangTech 2025: Detecting AI-Generated Product Reviews in Malayalam and Tamil Language Using Transformer Models. In Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 398–403, Acoma, The Albuquerque Convention Center, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: CUET_Absolute_Zero@DravidianLangTech 2025: Detecting AI-Generated Product Reviews in Malayalam and Tamil Language Using Transformer Models (Barua et al., DravidianLangTech 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.dravidianlangtech-1.71.pdf

PDF Cite Search Fix data