2024
pdf
bib
abs
dzFinNlp at AraFinNLP: Improving Intent Detection in Financial Conversational Agents
Mohamed Lichouri
|
Khaled Lounnas
|
Amziane Zakaria
Proceedings of The Second Arabic Natural Language Processing Conference
In this paper, we present our dzFinNlp team’s contribution for intent detection in financial conversational agents, as part of the AraFinNLP shared task. We experimented with various models and feature configurations, including traditional machine learning methods like LinearSVC with TF-IDF, as well as deep learning models like Long Short-Term Memory (LSTM). Additionally, we explored the use of transformer-based models for this task. Our experiments show promising results, with our best model achieving a micro F1-score of 93.02% and 67.21% on the ArBanking77 dataset, in the development and test sets, respectively.
pdf
bib
abs
dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features
Mohamed Lichouri
|
Khaled Lounnas
|
Zahaf Nadjib
|
Rabiai Ayoub
Proceedings of The Second Arabic Natural Language Processing Conference
This paper presents the contribution of our dzNLP team to the NADI 2024 shared task, specifically in Subtask 1 - Multi-label Country-level Dialect Identification (MLDID) (Closed Track). We explored various configurations to address the challenge: in Experiment 1, we utilized a union of n-gram analyzers (word, character, character with word boundaries) with different n-gram values; in Experiment 2, we combined a weighted union of Term Frequency-Inverse Document Frequency (TF-IDF) features with various weights; and in Experiment 3, we implemented a weighted major voting scheme using three classifiers: Linear Support Vector Classifier (LSVC), Random Forest (RF), and K-Nearest Neighbors (KNN).Our approach, despite its simplicity and reliance on traditional machine learning techniques, demonstrated competitive performance in terms of accuracy and precision. Notably, we achieved the highest precision score of 63.22% among the participating teams. However, our overall F1 score was approximately 21%, significantly impacted by a low recall rate of 12.87%. This indicates that while our models were highly precise, they struggled to recall a broad range of dialect labels, highlighting a critical area for improvement in handling diverse dialectal variations.
pdf
bib
abs
dzStance at StanceEval2024: Arabic Stance Detection based on Sentence Transformers
Mohamed Lichouri
|
Khaled Lounnas
|
Ouaras Rafik
|
Mohamed ABi
|
Anis Guechtouli
Proceedings of The Second Arabic Natural Language Processing Conference
This study compares Term Frequency-Inverse Document Frequency (TF-IDF) features with Sentence Transformers for detecting writers’ stances—favorable, opposing, or neutral—towards three significant topics: COVID-19 vaccine, digital transformation, and women empowerment. Through empirical evaluation, we demonstrate that Sentence Transformers outperform TF-IDF features across various experimental setups. Our team, dzStance, participated in a stance detection competition, achieving the 13th position (74.91%) among 15 teams in Women Empowerment, 10th (73.43%) in COVID Vaccine, and 12th (66.97%) in Digital Transformation. Overall, our team’s performance ranked 13th (71.77%) among all participants. Notably, our approach achieved promising F1-scores, highlighting its effectiveness in identifying writers’ stances on diverse topics. These results underscore the potential of Sentence Transformers to enhance stance detection models for addressing critical societal issues.
2023
pdf
bib
abs
USTHB at ArAIEval’23 Shared Task: Disinformation Detection System based on Linguistic Feature Concatenation
Mohamed Lichouri
|
Khaled Lounnas
|
Aicha Zitouni
|
Houda Latrache
|
Rachida Djeradi
Proceedings of ArabicNLP 2023
In this research paper, we undertake a comprehensive examination of several pivotal factors that impact the performance of Arabic Disinformation Detection in the ArAIEval’2023 shared task. Our exploration encompasses the influence of surface preprocessing, morphological preprocessing, the FastText vector model, and the weighted fusion of TF-IDF features. To carry out classification tasks, we employ the Linear Support Vector Classification (LSVC) model. In the evaluation phase, our system showcases significant results, achieving an F1 micro score of 76.70% and 50.46% for binary and multiple classification scenarios, respectively. These accomplishments closely correspond to the average F1 micro scores achieved by other systems submitted for the second subtask, standing at 77.96% and 64.85% for binary and multiple classification scenarios, respectively.
pdf
bib
abs
USTHB at NADI 2023 shared task: Exploring Preprocessing and Feature Engineering Strategies for Arabic Dialect Identification
Mohamed Lichouri
|
Khaled Lounnas
|
Aicha Zitouni
|
Houda Latrache
|
Rachida Djeradi
Proceedings of ArabicNLP 2023
In this paper, we conduct an in-depth analysis of several key factors influencing the performance of Arabic Dialect Identification NADI’2023, with a specific focus on the first subtask involving country-level dialect identification. Our investigation encompasses the effects of surface preprocessing, morphological preprocessing, FastText vector model, and the weighted concatenation of TF-IDF features. For classification purposes, we employ the Linear Support Vector Classification (LSVC) model. During the evaluation phase, our system demonstrates noteworthy results, achieving an F1 score of 62.51%. This achievement closely aligns with the average F1 scores attained by other systems submitted for the first subtask, which stands at 72.91%.
2022
pdf
bib
Towards an Automatic Dialect Identification System using Algerian Youtube Videos
Khaled Lounnas
|
Mohamed Lichouri
|
Mourad Abbas
|
Thissas Chahboub
|
Samir Salmi
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)
2021
pdf
bib
abs
Arabic Dialect Identification based on a Weighted Concatenation of TF-IDF Features
Mohamed Lichouri
|
Mourad Abbas
|
Khaled Lounnas
|
Besma Benaziz
|
Aicha Zitouni
Proceedings of the Sixth Arabic Natural Language Processing Workshop
In this paper, we analyze the impact of the weighted concatenation of TF-IDF features for the Arabic Dialect Identification task while we participated in the NADI2021 shared task. This study is performed for two subtasks: subtask 1.1 (country-level MSA) and subtask 1.2 (country-level DA) identification. The classifiers supporting our comparative study are Linear Support Vector Classification (LSVC), Linear Regression (LR), Perceptron, Stochastic Gradient Descent (SGD), Passive Aggressive (PA), Complement Naive Bayes (CNB), MutliLayer Perceptron (MLP), and RidgeClassifier. In the evaluation phase, our system gives F1 scores of 14.87% and 21.49%, for country-level MSA and DA identification respectively, which is very close to the average F1 scores achieved by the submitted systems and recorded for both subtasks (18.70% and 24.23%).
pdf
bib
abs
Preprocessing Solutions for Detection of Sarcasm and Sentiment for Arabic
Mohamed Lichouri
|
Mourad Abbas
|
Besma Benaziz
|
Aicha Zitouni
|
Khaled Lounnas
Proceedings of the Sixth Arabic Natural Language Processing Workshop
This paper describes our approach to detecting Sentiment and Sarcasm for Arabic in the ArSarcasm 2021 shared task. Data preprocessing is a crucial task for a successful learning, that is why we applied a set of preprocessing steps to the dataset before training two classifiers, namely Linear Support Vector Classifier (LSVC) and Bidirectional Long Short Term Memory (BiLSTM). The findings show that despite the simplicity of the proposed approach, using the LSVC model with a normalizing Arabic (NA) preprocessing and the BiLSTM architecture with an Embedding layer as input have yielded an encouraging F1score of 33.71% and 57.80% for sarcasm and sentiment detection, respectively.
pdf
bib
Towards Phone Number Recognition For Code Switched Algerian Dialect
Khaled Lounnas
|
Mourad Abbas
|
Mohamed Lichouri
Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021)
2019
pdf
bib
Building a Speech Corpus based on Arabic Podcasts for Language and Dialect Identification
Khaled Lounnas
|
Mourad Abbas
|
Mohamed Lichouri
Proceedings of the 3rd International Conference on Natural Language and Speech Processing