Dhaou Ghoul

2025

Fine-tuning AraBert model for arabic sentiment detection
Mustapha Jaballah | Dhaou Ghoul | Ammar Mars
Proceedings of the Shared Task on Sentiment Analysis for Arabic Dialects

Arabic exhibits a rich and intricate linguistic landscape, with Modern Standard Arabic (MSA) serving as the formal written and spoken medium, alongside a wide variety of regional dialects used in everyday communication. These dialects vary considerably in syntax, vocabulary, phonology, and meaning, presenting significant challenges for natural language processing (NLP). The complexity is particularly pronounced in sentiment analysis, where emotional expressions and idiomatic phrases differ markedly across regions, hindering consistent and accurate sentiment detection. This paper describes our submission to the Ahasis Shared Task: A Benchmark for Arabic Sentiment Analysis in the hospitality domain. This shared task focuses on advancing sentiment analysis techniques for Arabic dialects in the hotel domain. Our proposed approach achieved an F1 score of 0.88 % on the internal test set (split from the original training data), and 79.16% on the official hidden test set of the shared task. This performance secured our team second place in the Ahasis Shared Task.

2024

pdf bib abs

ISHFMG_TUN at StanceEval: Ensemble Method for Arabic Stance Evaluation System
Ammar Mars | Mustapha Jaballah | Dhaou Ghoul
Proceedings of the Second Arabic Natural Language Processing Conference

It is essential to understand the attitude of individuals towards specific topics in Arabic language for tasks like sentiment analysis, opinion mining, and social media monitoring. However, the diversity of the linguistic characteristics of the Arabic language presents several challenges to accurately evaluate the stance. In this study, we suggest ensemble approach to tackle these challenges. Our method combines different classifiers using the voting method. Through multiple experiments, we prove the effectiveness of our method achieving significant F1-score value equal to 0.7027. Our findings contribute to promoting NLP and offer treasured enlightenment for applications like sentiment analysis, opinion mining, and social media monitoring.

2021

pdf bib abs

Sarcasm and Sentiment Detection in Arabic: investigating the interest of character-level features
Dhaou Ghoul | Gaël Lejeune
Proceedings of the Sixth Arabic Natural Language Processing Workshop

We present three methods developed for the Shared Task on Sarcasm and Sentiment Detection in Arabic. We present a baseline that uses character n-gram features. We also propose two more sophisticated methods: a recurrent neural network with a word level representation and an ensemble classifier relying on word and character-level features. We chose to present results from an ensemble classifier but it was not very successful as compared to the best systems : 22th/37 on sarcasm detection and 15th/22 on sentiment detection. It finally appeared that our baseline could have been improved and beat those results.

2020

pdf bib abs

Calcul de similarité entre phrases : quelles mesures et quels descripteurs ? (Sentence Similarity : a study on similarity metrics with words and character strings )
Davide Buscaldi | Ghazi Felhi | Dhaou Ghoul | Joseph Le Roux | Gaël Lejeune | Xudong Zhang
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes

Cet article présente notre participation à l’édition 2020 du Défi Fouille de Textes DEFT 2020 et plus précisément aux deux tâches ayant trait à la similarité entre phrases. Dans notre travail nous nous sommes intéressé à deux questions : celle du choix de la mesure du similarité d’une part et celle du choix des opérandes sur lesquelles se porte la mesure de similarité. Nous avons notamment étudié la question de savoir s’il fallait utiliser des mots ou des chaînes de caractères (mots ou non-mots). Nous montrons d’une part que la similarité de Bray-Curtis peut être plus efficace et surtout plus stable que la similarité cosinus et d’autre part que le calcul de similarité sur des chaînes de caractères est plus efficace que le même calcul sur des mots.

2019

pdf bib abs

MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)
Dhaou Ghoul | Gaël Lejeune
Proceedings of the Fourth Arabic Natural Language Processing Workshop

We present MICHAEL, a simple lightweight method for automatic Arabic Dialect Identification on the MADAR travel domain Dialect Identification (DID). MICHAEL uses simple character-level features in order to perform a pre-processing free classification. More precisely, Character N-grams extracted from the original sentences are used to train a Multinomial Naive Bayes classifier. This system achieved an official score (accuracy) of 53.25% with 1<=N<=3 but showed a much better result with character 4-grams (62.17% accuracy).

pdf bib abs

Indexation et appariements de documents cliniques pour le Deft 2019 (Indexing and pairing texts of the medical domain )
Davide Buscaldi | Dhaou Ghoul | Joseph Le Roux | Gaël Lejeune
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Défi Fouille de Textes (atelier TALN-RECITAL)

Dans cet article, nous présentons nos méthodes pour les tâches d’indexation et d’appariements du Défi Fouile de Textes (Deft) 2019. Pour la taĉhe d’indexation nous avons testé deux méthodes, une fondée sur l’appariemetn préalable des documents du jeu de tset avec les documents du jeu d’entraînement et une autre méthode fondée sur l’annotation terminologique. Ces méthodes ont malheureusement offert des résultats assez faible. Pour la tâche d’appariement, nous avons dévellopé une méthode sans apprentissage fondée sur des similarités de chaînes de caractères ainsi qu’une méthode exploitant des réseaux siamois. Là encore les résultats ont été plutôt décevant même si la méthode non supervisée atteint un score plutôt honorable pour une méthode non-supervisée : 62% .

2013

pdf bib

Development of resources for training and the use of the tagger TreeTagger on Arabic (Développement de ressources pour l’entrainement et l’utilisation de l’étiqueteur morphosyntaxique TreeTagger sur l’arabe) [in French]
Dhaou Ghoul
Proceedings of RECITAL 2013

Co-authors

Ghazi Felhi 1

Xudong Zhang 1

Venues

Fix author