Ali Zain


2026

This paper describes our system submitted to the AbjadGenEval Shared Task at ArabicNLP 2026, which focuses on binary classification of human-written versus machine-generated text in low-resource languages. We participated in two independent subtasks targeting Arabic and Urdu news and literary texts. Our approach relies exclusively on fine-tuning XLM-RoBERTa, a multilingual Transformer-based model, under carefully controlled training and preprocessing settings. While the same model architecture was used for both subtasks, language-specific data handling strategies were applied based on empirical observations. The proposed system achieved first place in the Urdu subtask and third place in the Arabic subtask according to the official evaluation. These results demonstrate that multilingual pretrained models can serve as strong and reliable systems for AI-generated text detection across diverse languages.

2025

We participated in two subtasks: Subtask 1, focusing on news articles, and Subtask 2, focusing on academic abstracts. Our submission is based on three distinct architectural approaches: (1) Fine-tuning a RoBERTa-base model, (2) A TF-IDF based system with a Linear Support Vector Machine (SVM) classifier, and (3) An experimental system named Candace, which leverages probabilistic features extracted from multiple Llama-3.2 models (1B and 3B variants) fed into a Transformer Encoder-based classifier. Our RoBERTa-based system demonstrated strong performance on the development and test sets for both subtasks and was chosen as our primary submission to both the shared subtasks.