Mourad Abbas


2025

2024

2023

2022

2021

In this paper, we analyze the impact of the weighted concatenation of TF-IDF features for the Arabic Dialect Identification task while we participated in the NADI2021 shared task. This study is performed for two subtasks: subtask 1.1 (country-level MSA) and subtask 1.2 (country-level DA) identification. The classifiers supporting our comparative study are Linear Support Vector Classification (LSVC), Linear Regression (LR), Perceptron, Stochastic Gradient Descent (SGD), Passive Aggressive (PA), Complement Naive Bayes (CNB), MutliLayer Perceptron (MLP), and RidgeClassifier. In the evaluation phase, our system gives F1 scores of 14.87% and 21.49%, for country-level MSA and DA identification respectively, which is very close to the average F1 scores achieved by the submitted systems and recorded for both subtasks (18.70% and 24.23%).
This paper describes our approach to detecting Sentiment and Sarcasm for Arabic in the ArSarcasm 2021 shared task. Data preprocessing is a crucial task for a successful learning, that is why we applied a set of preprocessing steps to the dataset before training two classifiers, namely Linear Support Vector Classifier (LSVC) and Bidirectional Long Short Term Memory (BiLSTM). The findings show that despite the simplicity of the proposed approach, using the LSVC model with a normalizing Arabic (NA) preprocessing and the BiLSTM architecture with an Embedding layer as input have yielded an encouraging F1score of 33.71% and 57.80% for sarcasm and sentiment detection, respectively.

2020

This paper describes our system developed for automatically classifying tweets that mention medications. We used the Decision Tree classifier for this task. We have shown that using some elementary preprocessing steps and TF-IDF n-grams led to acceptable classifier performance. Indeed, the F1-score recorded was 74.58% in the development phase and 63.70% in the test phase.
In this paper, we present a description of our experiments on country-level Arabic dialect identification. A comparison study between a set of classifiers has been carried out. The best results were achieved using the Linear Support Vector Classification (LSVC) model by applying a Random Over Sampling (ROS) process yielding an F1-score of 18.74% in the post-evaluation phase. In the evaluation phase, our best submitted system has achieved an F1-score of 18.27%, very close to the average F1-score (18.80%) obtained for all the submitted systems.

2019

This paper describes the solution that we propose on MADAR 2019 Arabic Fine-Grained Dialect Identification task. The proposed solution utilized a set of classifiers that we trained on character and word features. These classifiers are: Support Vector Machines (SVM), Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), Stochastic Gradient Descent (SGD), Passive Aggressive(PA) and Perceptron (PC). The system achieved competitive results, with a performance of 62.87 % and 62.12 % for both development and test sets.

2015