Mohammad AL-Smadi

2025

A Hybrid Transformer-Based Model for Sentiment Analysis of Arabic Dialect Hotel Reviews
Rawand Alfugaha | Mohammad AL-Smadi
Proceedings of the Shared Task on Sentiment Analysis for Arabic Dialects

This paper describes the AraNLP system developed for the “Ahasis” shared task on sentiment detection in Arabic dialects for hotel reviews. The task involved classifying the overall sentiment of hotel reviews (Positive, Negative, or Neutral) written in Arabic dialects, specifically Saudi and Darija. Our proposed model, AraNLP, is a hybrid deep learning classifier that leverages the strengths of a transformer-based Arabic model (AraELECTRA)augmented with classical bag-of-words style features (TF-IDF). Our system achieved an F1-score of 76%, securing the 5th rank in the shared task, significantly outperforming the baseline system’s F1-score of 56%.

pdf bib abs

IntegrityAI at GenAI Detection Task 2: Detecting Machine-Generated Academic Essays in English and Arabic Using ELECTRA and Stylometry
Mohammad AL-Smadi
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

We present a robust system for detecting machine-generated academic essays, leveraging pre-trained, transformer-based models specifically tailored for both English and Arabic texts. Our primary approach utilizes ELECTRA-Small for English and AraELECTRA-Base for Arabic, fine-tuned to deliver high performance while balancing computational efficiency. By incorporating stylometric features, such as word count, sentence length, and vocabulary richness, our models excel at distinguishing between human-written and AI-generated content. Proposed models achieved excellent results with an F1- score of 99.7%, ranking second among of 26 teams in the English subtask, and 98.4%, finishing first out of 23 teams in the Arabic one. Main Contributions include: (1) We develop lightweight and efficient models using ELECTRA-Small and AraELECTRA-Base, achieving an impressive F1-score of 98.5% on the English dataset and 98.4% on the Arabic dataset. This demonstrates the power of combining transformer-based architectures with stylometric analysis. (2) We optimize our system to maintain high performance while being computationally efficient, making it suitable for deployment on GPUs with moderate memory capacity. (3) Additionally, we tested larger models, such as ELECTRA-Large, achieving an even higher F1-score of 99.7% on the English dataset, highlighting the potential for further accuracy gains when using more computationally intensive models.

pdf bib

QU-NLP at QIAS 2025 Shared Task: A Two-Phase LLM Fine-Tuning and Retrieval-Augmented Generation Approach for Islamic Inheritance Reasoning
Mohammad AL-Smadi
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib

Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset
Rawand Alfugaha | Mohammad AL-Smadi
Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP-2025)

pdf bib abs

A Multimodal Transformer-based Approach for Cross-Domain Detection of Machine-Generated Text
Mohammad AL-Smadi
Proceedings of the Shared Task on Multi-Domain Detection of AI-Generated Text

The rapid advancement of large language models (LLMs) has made it increasingly challenging to distinguish between human-written and machine-generated content. This paper presents IntegrityAI, a multimodal ELECTRA-based model for the detection of AI-generated text across multiple domains. Our approach combines textual features processed through a pre-trained ELECTRA model with handcrafted stylometric features to create a robust classifier. We evaluate our system on the Multi-Domain Detection of AI-Generated Text (M-DAIGT) shared task, which focuses on identifying AI-generated content in news articles and academic writing. IntegrityAI achieves exceptional performance and ranked 1st in both subtasks, with F1-scores of 99.6% and 99.9% on the news article detection and academic writing detection subtasks, respectively. Our results demonstrate the effectiveness of combining transformer-based models with stylometric analysis for detecting AI-generated content across diverse domains and writing styles.

2020

pdf bib abs

SAJA at TRAC 2020 Shared Task: Transfer Learning for Aggressive Identification with XGBoost
Saja Tawalbeh | Mahmoud Hammad | Mohammad AL-Smadi
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying

we have developed a system based on transfer learning technique depending on universal sentence encoder (USE) embedding that will be trained in our developed model using xgboost classifier to identify the aggressive text data from English content. A reference dataset has been provided from TRAC 2020 to evaluate the developed approach. The developed approach achieved in sub-task EN-A 60.75% F1 (weighted) which ranked fourteenth out of sixteen teams and achieved 85.66% F1 (weighted) in sub-task EN-B which ranked six out of fifteen teams.

pdf bib abs

HR@JUST Team at SemEval-2020 Task 4: The Impact of RoBERTa Transformer for Evaluation Common Sense Understanding
Heba Al-Jarrah | Rahaf Al-Hamouri | Mohammad AL-Smadi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the results of our team HR@JUST participation at SemEval-2020 Task 4 - Commonsense Validation and Explanation (ComVE) for POST evaluation period. The provided task consists of three sub-tasks, we participate in task A. We considered a state-of-the-art approach for solving this task by performing RoBERTa model with no Next Sentences Prediction (NSP), dynamic masking, larger training data, and larger batch size. The achieved results show that we got the 11th rank on the final test set leaderboard with an accuracy of 91.3%.

pdf bib abs

NLP@JUST at SemEval-2020 Task 4: Ensemble Technique for BERT and Roberta to Evaluate Commonsense Validation
Emran Al-Bashabsheh | Ayah Abu Aqouleh | Mohammad AL-Smadi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents the work of the NLP@JUST team at SemEval-2020 Task 4 competition that related to commonsense validation and explanation (ComVE) task. The team participates in sub-taskA (Validation) which related to validation that checks if the text is against common sense or not. Several models have trained (i.e. Bert, XLNet, and Roberta), however, the main models used are the RoBERTa-large and BERT Whole word masking. As well as, we utilized the results from both models to generate final prediction by using the average Ensemble technique, that used to improve the overall performance. The evaluation result shows that the implemented model achieved an accuracy of 93.9% obtained and published at the post-evaluation result on the leaderboard.

pdf bib abs

KEIS@JUST at SemEval-2020 Task 12: Identifying Multilingual Offensive Tweets Using Weighted Ensemble and Fine-Tuned BERT
Saja Tawalbeh | Mahmoud Hammad | Mohammad AL-Smadi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This research presents our team KEIS@JUST participation at SemEval-2020 Task 12 which represents shared task on multilingual offensive language. We participated in all the provided languages for all subtasks except sub-task-A for the English language. Two main approaches have been developed the first is performed to tackle both languages Arabic and English, a weighted ensemble consists of Bi-GRU and CNN followed by Gaussian noise and global pooling layer multiplied by weights to improve the overall performance. The second is performed for other languages, a transfer learning from BERT beside the recurrent neural networks such as Bi-LSTM and Bi-GRU followed by a global average pooling layer. Word embedding and contextual embedding have been used as features, moreover, data augmentation has been used only for the Arabic language.

pdf bib abs

Team Alexa at NADI Shared Task
Mutaz Younes | Nour Al-khdour | Mohammad AL-Smadi
Proceedings of the Fifth Arabic Natural Language Processing Workshop

In this paper, we discuss our team’s work on the NADI Shared Task. The task requires classifying Arabic tweets among 21 dialects. We tested out different approaches, and the best one was the simplest one. Our best submission was using Multinational Naive Bayes (MNB) classifier (Small and Hsiao, 1985) with n-grams as features. Despite its simplicity, this classifier shows better results than complicated models such as BERT. Our best submitted score was 17% F1-score and 35% accuracy.

pdf bib abs

JUSTMasters at SemEval-2020 Task 3: Multilingual Deep Learning Model to Predict the Effect of Context in Word Similarity
Nour Al-khdour | Mutaz Bni Younes | Malak Abdullah | Mohammad AL-Smadi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

There is a growing research interest in studying word similarity. Without a doubt, two similar words in a context may considered different in another context. Therefore, this paper investigates the effect of the context in word similarity. The SemEval-2020 workshop has provided a shared task (Task 3: Predicting the (Graded) Effect of Context in Word Similarity). In this task, the organizers provided unlabeled datasets for four languages, English, Croatian, Finnish and Slovenian. Our team, JUSTMasters, has participated in this competition in the two subtasks: A and B. Our approach has used a weighted average ensembling method for different pretrained embeddings techniques for each of the four languages. Our proposed model outperformed the baseline models in both subtasks and acheived the best result for subtask 2 in English and Finnish, with score 0.725 and 0.68 respectively. We have been ranked the sixth for subtask 1, with scores for English, Croatian, Finnish, and Slovenian as follows: 0.738, 0.44, 0.546, 0.512.

2019

pdf bib abs

In this paper, we describe our team’s effort on the MADAR Shared Task on Arabic Fine-Grained Dialect Identification. The task requires building a system capable of differentiating between 25 different Arabic dialects in addition to MSA. Our approach is simple. After preprocessing the data, we use Data Augmentation (DA) to enlarge the training data six times. We then build a language model and extract n-gram word-level and character-level TF-IDF features and feed them into an MNB classifier. Despite its simplicity, the resulting model performs really well producing the 4th highest F-measure and region-level accuracy and the 5th highest precision, recall, city-level accuracy and country-level accuracy among the participating teams.