Chayma Fourati

2024

PoliTun: Tunisian Political Dataset for Detecting Public Opinions and Categories Orientation
Chayma Fourati | Roua Hammami | Chiraz Latiri | Hatem Haddad
Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024)

2022

pdf bib abs

iCompass at Arabic Hate Speech 2022: Detect Hate Speech Using QRNN and Transformers
Mohamed Aziz Bennessir | Malek Rhouma | Hatem Haddad | Chayma Fourati
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

This paper provides a detailed overview of the system we submitted as part of the OSACT2022 Shared Tasks on Fine-Grained Hate Speech Detection on Arabic Twitter, its outcome, and limitations. Our submission is accomplished with a hard parameter sharing Multi-Task Model that consisted of a shared layer containing state-of-the-art contextualized text representation models such as MarBERT, AraBERT, ArBERT and task specific layers that were fine-tuned with Quasi-recurrent neural networks (QRNN) for each down-stream subtask. The results show that MARBERT fine-tuned with QRNN outperforms all of the previously mentioned models.

pdf bib abs

iCompass Working Notes for the Nuanced Arabic Dialect Identification Shared task
Abir Messaoudi | Chayma Fourati | Hatem Haddad | Moez BenHajhmida
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

We describe our submitted system to the Nuanced Arabic Dialect Identification (NADI) shared task. We tackled only the first subtask (Subtask 1). We used state-of-the-art Deep Learning models and pre-trained contextualized text representation models that we finetuned according to the downstream task in hand. As a first approach, we used BERT Arabic variants: MARBERT with its two versions MARBERT v1 and MARBERT v2, we combined MARBERT embeddings with a CNN classifier, and finally, we tested the Quasi-Recurrent Neural Networks (QRNN) model. The results found show that version 2 of MARBERT outperforms all of the previously mentioned models on Subtask 1.

2021

pdf bib abs

iCompass at NLP4IF-2021–Fighting the COVID-19 Infodemic
Wassim Henia | Oumayma Rjab | Hatem Haddad | Chayma Fourati
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

This paper provides a detailed overview of the system and its outcomes, which were produced as part of the NLP4IF Shared Task on Fighting the COVID-19 Infodemic at NAACL 2021. This task is accomplished using a variety of techniques. We used state-of-the-art contextualized text representation models that were fine-tuned for the downstream task in hand. ARBERT, MARBERT,AraBERT, Arabic ALBERT and BERT-base-arabic were used. According to the results, BERT-base-arabic had the highest 0.784 F1 score on the test set.

pdf bib abs

On various Social Media platforms, people, tend to use the informal way to communicate, or write posts and comments: their local dialects. In Africa, more than 1500 dialects and languages exist. Particularly, Tunisians talk and write informally using Latin letters and numbers rather than Arabic ones. In this paper, we introduce a large common-crawl-based Tunisian Arabizi dialectal dataset dedicated for Sentiment Analysis. The dataset consists of a total of 100k comments (about movies, politic, sport, etc.) annotated manually by Tunisian native speakers as Positive, negative and Neutral. We evaluate our dataset on sentiment analysis task using the Bidirectional Encoder Representations from Transformers (BERT) as a contextual language model in its multilingual version (mBERT) as an embedding technique then combining mBERT with Convolutional Neural Network (CNN) as classifier. The dataset is publicly available.

pdf bib abs

We describe our submitted system to the 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic (Abu Farha et al., 2021). We tackled both subtasks, namely Sarcasm Detection (Subtask 1) and Sentiment Analysis (Subtask 2). We used state-of-the-art pretrained contextualized text representation models and fine-tuned them according to the downstream task in hand. As a first approach, we used Google’s multilingual BERT and then other Arabic variants: AraBERT, ARBERT and MARBERT. The results found show that MARBERT outperforms all of the previously mentioned models overall, either on Subtask 1 or Subtask 2.

Co-authors

Mohamed Aziz Bennessir 1

Venues

Fix author