Hadda Cherroun

2025

LMSA at AraGenEval Shared Task: Ensemble-Based Detection of AI-Generated Arabic Text Using Multilingual and Arabic-Specific Models
Kaoutar Zita | Attia Nehar | Abdelkader Khelil | Slimane Bellaouar | Hadda Cherroun
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

2024

pdf bib abs

MODOS at ArAIEval Shared Task: Multimodal Propagandistic Memes Classification Using Weighted SAM, CLIP and ArabianGPT
Abdelhamid Haouhat | Hadda Cherroun | Slimane Bellaouar | Attia Nehar
Proceedings of the Second Arabic Natural Language Processing Conference

Arabic social media platforms are increasingly using propaganda to deceive or influence people. This propaganda is often spread through multimodal content, such as memes. While substantial research has addressed the automatic detection of propaganda in English content, this paper presents the MODOS team’s participation in the Arabic Multimodal Propagandistic Memes Classification shared task. Our system deploys the Segment Anything Model (SAM) and CLIP for image representation and ARABIAN-GPT embeddings for text. Then, we employ LSTM encoders followed by a weighted fusion strategy to perform binary classification. Our system achieved competitive performance in distinguishing between propagandistic and non-propagandistic memes, scored 0.7290 macro F1, and ranked 6th among the participants.

The success of machine learning for automatic speech processing has raised the need for large scale datasets. However, collecting such data is often a challenging task as it implies significant investment involving time and money cost. In this paper, we devise a recipe for building largescale Speech Corpora by harnessing Web resources namely YouTube, other Social Media, Online Radio and TV. We illustrate our methodology by building KALAM’DZ, An Arabic Spoken corpus dedicated to Algerian dialectal varieties. The preliminary version of our dataset covers all major Algerian dialects. In addition, we make sure that this material takes into account numerous aspects that foster its richness. In fact, we have targeted various speech topics. Some automatic and manual annotations are provided. They gather useful information related to the speakers and sub-dialect information at the utterance level. Our corpus encompasses the 8 major Algerian Arabic sub-dialects with 4881 speakers and more than 104.4 hours segmented in utterances of at least 6 s.

Co-authors

Abdelkader Khelil 1

Abdallah Lakhdari 1

Basma Sayah 1

Mohamed Cherif Zeghad 1

Ilyes Zine 1

Kaoutar Zita 1

Venues

Fix author

Hadda Cherroun

2025

2024

2023

2021

2019

2017

Co-authors

Venues