Ali Zain
2026
LoRAD: Low-Resource AI-Generated Text Detection with XLM-RoBERTa
Ali Zain
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Ali Zain
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
This paper describes our system submitted to the AbjadGenEval Shared Task at ArabicNLP 2026, which focuses on binary classification of human-written versus machine-generated text in low-resource languages. We participated in two independent subtasks targeting Arabic and Urdu news and literary texts. Our approach relies exclusively on fine-tuning XLM-RoBERTa, a multilingual Transformer-based model, under carefully controlled training and preprocessing settings. While the same model architecture was used for both subtasks, language-specific data handling strategies were applied based on empirical observations. The proposed system achieved first place in the Urdu subtask and third place in the Arabic subtask according to the official evaluation. These results demonstrate that multilingual pretrained models can serve as strong and reliable systems for AI-generated text detection across diverse languages.
2025
BUSTED at ARATECT Shared Task: A Comparative Study of Transformer-Based Models for Arabic AI-Generated Text Detection
Ali Zain | Sareem Farooqui | Muhammad Rafi
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Ali Zain | Sareem Farooqui | Muhammad Rafi
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Shared Task on Multi-Domain Detection of AI-Generated Text (M-DAIGT)
Sareem Farooqui | Ali Zain | Dr Muhammad Rafi
Proceedings of the Shared Task on Multi-Domain Detection of AI-Generated Text
Sareem Farooqui | Ali Zain | Dr Muhammad Rafi
Proceedings of the Shared Task on Multi-Domain Detection of AI-Generated Text
We participated in two subtasks: Subtask 1, focusing on news articles, and Subtask 2, focusing on academic abstracts. Our submission is based on three distinct architectural approaches: (1) Fine-tuning a RoBERTa-base model, (2) A TF-IDF based system with a Linear Support Vector Machine (SVM) classifier, and (3) An experimental system named Candace, which leverages probabilistic features extracted from multiple Llama-3.2 models (1B and 3B variants) fed into a Transformer Encoder-based classifier. Our RoBERTa-based system demonstrated strong performance on the development and test sets for both subtasks and was chosen as our primary submission to both the shared subtasks.