2023
pdf
bib
abs
USTHB at ArAIEval’23 Shared Task: Disinformation Detection System based on Linguistic Feature Concatenation
Mohamed Lichouri
|
Khaled Lounnas
|
Aicha Zitouni
|
Houda Latrache
|
Rachida Djeradi
Proceedings of ArabicNLP 2023
In this research paper, we undertake a comprehensive examination of several pivotal factors that impact the performance of Arabic Disinformation Detection in the ArAIEval’2023 shared task. Our exploration encompasses the influence of surface preprocessing, morphological preprocessing, the FastText vector model, and the weighted fusion of TF-IDF features. To carry out classification tasks, we employ the Linear Support Vector Classification (LSVC) model. In the evaluation phase, our system showcases significant results, achieving an F1 micro score of 76.70% and 50.46% for binary and multiple classification scenarios, respectively. These accomplishments closely correspond to the average F1 micro scores achieved by other systems submitted for the second subtask, standing at 77.96% and 64.85% for binary and multiple classification scenarios, respectively.
pdf
bib
abs
USTHB at NADI 2023 shared task: Exploring Preprocessing and Feature Engineering Strategies for Arabic Dialect Identification
Mohamed Lichouri
|
Khaled Lounnas
|
Aicha Zitouni
|
Houda Latrache
|
Rachida Djeradi
Proceedings of ArabicNLP 2023
In this paper, we conduct an in-depth analysis of several key factors influencing the performance of Arabic Dialect Identification NADI’2023, with a specific focus on the first subtask involving country-level dialect identification. Our investigation encompasses the effects of surface preprocessing, morphological preprocessing, FastText vector model, and the weighted concatenation of TF-IDF features. For classification purposes, we employ the Linear Support Vector Classification (LSVC) model. During the evaluation phase, our system demonstrates noteworthy results, achieving an F1 score of 62.51%. This achievement closely aligns with the average F1 scores attained by other systems submitted for the first subtask, which stands at 72.91%.
2021
pdf
bib
abs
Arabic Dialect Identification based on a Weighted Concatenation of TF-IDF Features
Mohamed Lichouri
|
Mourad Abbas
|
Khaled Lounnas
|
Besma Benaziz
|
Aicha Zitouni
Proceedings of the Sixth Arabic Natural Language Processing Workshop
In this paper, we analyze the impact of the weighted concatenation of TF-IDF features for the Arabic Dialect Identification task while we participated in the NADI2021 shared task. This study is performed for two subtasks: subtask 1.1 (country-level MSA) and subtask 1.2 (country-level DA) identification. The classifiers supporting our comparative study are Linear Support Vector Classification (LSVC), Linear Regression (LR), Perceptron, Stochastic Gradient Descent (SGD), Passive Aggressive (PA), Complement Naive Bayes (CNB), MutliLayer Perceptron (MLP), and RidgeClassifier. In the evaluation phase, our system gives F1 scores of 14.87% and 21.49%, for country-level MSA and DA identification respectively, which is very close to the average F1 scores achieved by the submitted systems and recorded for both subtasks (18.70% and 24.23%).
pdf
bib
abs
Preprocessing Solutions for Detection of Sarcasm and Sentiment for Arabic
Mohamed Lichouri
|
Mourad Abbas
|
Besma Benaziz
|
Aicha Zitouni
|
Khaled Lounnas
Proceedings of the Sixth Arabic Natural Language Processing Workshop
This paper describes our approach to detecting Sentiment and Sarcasm for Arabic in the ArSarcasm 2021 shared task. Data preprocessing is a crucial task for a successful learning, that is why we applied a set of preprocessing steps to the dataset before training two classifiers, namely Linear Support Vector Classifier (LSVC) and Bidirectional Long Short Term Memory (BiLSTM). The findings show that despite the simplicity of the proposed approach, using the LSVC model with a normalizing Arabic (NA) preprocessing and the BiLSTM architecture with an Embedding layer as input have yielded an encouraging F1score of 33.71% and 57.80% for sarcasm and sentiment detection, respectively.