Md. Rafiul Biswas

Also published as: Md. Rafiul Biswas

2026

A Multi-Task Learning Framework for Modeling Engagement and Topic-Sensitive Responses in Arabic Women’s Discourse
Mabrouka Bessghaier | Md. Rafiul Biswas | Shimaa Ibrahim | Wajdi Zaghouani
Findings of the Association for Computational Linguistics: EACL 2026

Predicting how audiences react to Arabic social media posts requires reasoning beyond textual sentiment: reactions emerge from collective interpretation moderated by engagement dynamics and topical context. We present a multi-task learning (MTL) framework that jointly learns (i) audience reaction classification (Love, Haha, Angry, Sad, Care, Wow), (ii) engagement magnitude regression (six reactions, comments, shares), and (iii) non-engagement detection. On a corpus of 158k Arabic Facebook posts spanning women’s rights, gender debates, and economic empowerment, our model achieves a test macro-F1 of 72.4 and weighted-F1 of 89.1.

2025

pdf bib abs

We present ImageEval 2025, the first shared task dedicated to Arabic image captioning. The task addresses the critical gap in multimodal Arabic NLP by focusing on two complementary subtasks: (1) creating the first open-source, manually-captioned Arabic image dataset through a collaborative datathon, and (2) developing and evaluating Arabic image captioning models. A total of 44 teams registered, of which eight submitted during the test phase, producing 111 valid submissions. Evaluation was conducted using automatic metrics, LLM-based judgment, and human assessment. In Subtask 1, the best-performing system achieved a cosine similarity of 65.5, while in Subtask 2, the top score was 60.0. Although these results show encouraging progress, they also confirm that Arabic image captioning remains a challenging task, particularly due to cultural grounding requirements, morphological richness, and dialectal variation. All datasets, baseline models, and evaluation tools are released publicly to support future research in Arabic multimodal NLP.

pdf bib abs

This paper presents the MAHED 2025 Shared Task on Multimodal Detection of Hope and Hate Emotions in Arabic Content, comprising three subtasks: (1) text-based classification of Arabic content into hate and hope, (2) multi-task learning for joint prediction of emotions, offensive content, and hate speech, and (3) multimodal detection of hateful content in Arabic memes. We provide three high-quality datasets totaling over 22,000 instances sourced from social media platforms, annotated by native Arabic speakers with Cohen’s Kappa exceeding 0.85. Our evaluation attracted 46 leaderboard submissions from participants, with systems leveraging Arabic-specific pre-trained language models (AraBERT, MARBERT), large language models (GPT-4, Gemini), and multimodal fusion architectures combining CLIP vision encoders with Arabic text models. The best-performing systems achieved macro F1-scores of 0.723 (Task 1), 0.578 (Task 2), and 0.796 (Task 3), with top teams employing ensemble methods, class-weighted training, and OCR-aware multimodal fusion. Analysis reveals persistent challenges in dialectal robustness, minority class detection for hope speech, and highlights key directions for future Arabic content moderation research.

pdf bib

MarsadLab at BAREC Shared Task 2025: Strict-Track Readability Prediction with Specialized AraBERT Models on BAREC
Shimaa Ibrahim | Md. Rafiul Biswas | Mabrouka Bessghaier | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib abs

EmoHopeSpeech: An Annotated Dataset of Emotions and Hope Speech in English and Arabic
Wajdi Zaghouani | Md. Rafiul Biswas
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

This research introduces a bilingual dataset comprising 27,456 entries for Arabic and 10,036 entries for English, annotated for emotions and hope speech, addressing the scarcity of multi-emotion (Emotion and hope) datasets. The dataset provides comprehensive annotations capturing emotion intensity, complexity, and causes, alongside detailed classifications and subcategories for hope speech. To ensure annotation reliability, Fleiss’ Kappa was employed, revealing 0.75-0.85 agreement among annotators both for Arabic and English language. The evaluation metrics (micro-F1-Score=0.67) obtained from the baseline model (i.e., transformer-based AraBERT model) validate that the data annotations are worthy.

pdf bib abs

Enhancing Arabic Dialectal Sentiment Analysis through Advanced Data Augmentation Techniques
Md. Rafiul Biswas | Wajdi Zaghouani
Proceedings of the Shared Task on Sentiment Analysis for Arabic Dialects

This work addresses the challenge of Arabic sentiment analysis in the hospitality domain in all dialects by using data augmentation techniques. We created a pipeline with three simple techniques: context-based paraphrasing, pattern-based sentence generation, and domain-specific word replacement. Our method preserves the original dialect features, meanings, and key classification details while adding diversity to the training data. It also includes automatic fallback between methods to handle challenges effectively. We used the Fanar API for dialectal data augmentation in the hospitality domain. The AraBERT-Large-v02 model was fine-tuned on original and augmented data, showing improved performance. This study helps solve the problem of limited dialect data in Arabic NLP and offers an effective framework that is useful for other Arabic text analysis tasks.

pdf bib

MarsadLab at NADI Shared Task: Arabic Dialect Identification and Speech Recognition using ECAPA-TDNN and Whisper
Md. Rafiul Biswas | Kais Attia | Shimaa Ibrahim | Mabrouka Bessghaier | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib abs

Evaluation of Pretrained and Instruction-Based Pretrained Models for Emotion Detection in Arabic Social Media Text
Md. Rafiul Biswas | Shimaa Ibrahim | Mabrouka Bessghaier | Wajdi Zaghouani
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

This study evaluates three approaches—instruction prompting of large language models (LLMs), instruction fine-tuning of LLMs, and transformer-based pretrained models on emotion detection in Arabic social media text. We compare pretrained transformer models like AraBERT, CaMelBERT, and XLM-RoBERTa with instruction prompting with advanced LLMs like GPT-4o, Gemini, Deepseek, and Fanar, and instruction fine-tuning approaches with LLMs like Llama 3.1, Mistral, and Phi. With a highly preprocessed dataset of 10,000 labeled Arabic tweets with overlapping emotional labels, our findings reveal that transformer-based pretrained models outperform instruction prompting and instruction fine-tuning approaches. Instruction prompts leverage general linguistic skills with maximum efficiency but fall short in detecting subtle emotional contexts. Instruction fine-tuning is more specific but trails behind pretrained transformer models. Our findings establish the need for optimized instruction-based approaches and underscore the important role played by domain-specific transformer architectures in accurate Arabic emotion detection.

pdf bib

MarsadLab at AraHealthQA: Hybrid Contextual–Lexical Fusion with AraBERT for Question and Answer Categorization
Mabrouka Bessghaier | Shimaa Ibrahim | Md. Rafiul Biswas | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib

MarsadLab at TAQEEM 2025: Prompt-Aware Lexicon-Enhanced Transformer for Arabic Automated Essay Scoring
Mabrouka Bessghaier | Md. Rafiul Biswas | Amira Dhouib | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib abs

An Annotated Corpus of Arabic Tweets for Hate Speech Analysis
Wajdi Zaghouani | Md. Rafiul Biswas
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Identifying hate speech content in the Arabic language is challenging due to the rich quality of dialectal variations. This study introduces a multilabel hate speech dataset in the Arabic language. We have collected 10,000 Arabic tweets and annotated each tweet, whether it contains offensive content or not. If a text contains offensive content, we further classify it into different hate speech targets such as religion, gender, politics, ethnicity, origin, and others. A text can contain either single or multiple targets. Multiple annotators are involved in the data annotation task. We calculated the inter-annotator agreement, which was reported to be 0.86 for offensive content and 0.71 for multiple hate speech targets. Finally, we evaluated the data annotation task by employing a different transformers-based model in which AraBERTv2 outperformed with a micro-F1 score of 0.7865 and an accuracy of 0.786.

pdf bib

MarsadLab at PalmX Shared Task: An LLM Benchmark for Arabic Culture and Islamic Civilization
Md. Rafiul Biswas | Shimaa Ibrahim | Kais Attia | Firoj Alam | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib

MarsadLab at AraGenEval Shared Task: LLM-Based Approaches to Arabic Authorship Style Transfer and Identification
Md. Rafiul Biswas | Mabrouka Bessghaier | Firoj Alam | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

2024

pdf bib abs

We present an overview of the second edition of the ArAIEval shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. In this edition, ArAIEval offers two tasks: (i) detection of propagandistic textual spans with persuasion techniques identification in tweets and news articles, and (ii) distinguishing between propagandistic and non-propagandistic memes. A total of 14 teams participated in the final evaluation phase, with 6 and 9 teams participating in Tasks 1 and 2, respectively. Finally, 11 teams submitted system description papers. Across both tasks, we observed that fine-tuning transformer models such as AraBERT was at the core of the majority of the participating systems. We provide a description of the task setup, including a description of the dataset construction and the evaluation setup. We further provide a brief overview of the participating systems. All datasets and evaluation scripts are released to the research community. We hope this will enable further research on these important tasks in Arabic.

pdf bib abs

So Hateful! Building a Multi-Label Hate Speech Annotated Arabic Dataset
Wajdi Zaghouani | Hamdy Mubarak | Md. Rafiul Biswas
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Social media enables widespread propagation of hate speech targeting groups based on ethnicity, religion, or other characteristics. With manual content moderation being infeasible given the volume, automatic hate speech detection is essential. This paper analyzes 70,000 Arabic tweets, from which 15,965 tweets were selected and annotated, to identify hate speech patterns and train classification models. Annotators labeled the Arabic tweets for offensive content, hate speech, emotion intensity and type, effect on readers, humor, factuality, and spam. Key findings reveal 15% of tweets contain offensive language while 6% have hate speech, mostly targeted towards groups with common ideological or political affiliations. Annotations capture diverse emotions, and sarcasm is more prevalent than humor. Additionally, 10% of tweets provide verifiable factual claims, and 7% are deemed important. For hate speech detection, deep learning models like AraBERT outperform classical machine learning approaches. By providing insights into hate speech characteristics, this work enables improved content moderation and reduced exposure to online hate. The annotated dataset advances Arabic natural language processing research and resources.

pdf bib abs

MemeMind at ArAIEval Shared Task: Spotting Persuasive Spans in Arabic Text with Persuasion Techniques Identification
Md. Rafiul Biswas | Zubair Shah | Wajdi Zaghouani
Proceedings of the Second Arabic Natural Language Processing Conference

This paper focuses on detecting propagandistic spans and persuasion techniques in Arabic text from tweets and news paragraphs. Each entry in the dataset contains a text sample and corresponding labels that indicate the start and end positions of propaganda techniques within the text. Tokens falling within a labeled span were assigned ’B’ (Begin) or ’I’ (Inside) tags, ’O’, corresponding to the specific propaganda technique. Using attention masks, we created uniform lengths for each span and assigned BIO tags to each token based on the provided labels. Then, we used AraBERT-base pre-trained model for Arabic text tokenization and embeddings with a token classification layer to identify propaganda techniques. Our training process involves a two-phase fine-tuning approach. First, we train only the classification layer for a few epochs, followed by full model fine-tuning, updating all parameters. This methodology allows the model to adapt to the specific characteristics of the propaganda detection task while leveraging the knowledge captured by the pretrained AraBERT model. Our approach achieved an F1 score of 0.2774, securing the 3rd position in the leaderboard of Task 1.

pdf bib abs

MemeMind at ArAIEval Shared Task: Generative Augmentation and Feature Fusion for Multimodal Propaganda Detection in Arabic Memes through Advanced Language and Vision Models
Uzair Shah | Md. Rafiul Biswas | Marco Agus | Mowafa Househ | Wajdi Zaghouani
Proceedings of the Second Arabic Natural Language Processing Conference

Detecting propaganda in multimodal content, such as memes, is crucial for combating disinformation on social media. This paper presents a novel approach for the ArAIEval 2024 shared Task 2 on Multimodal Propagandistic Memes Classification, involving text, image, and multimodal classification of Arabic memes. For text classification (Task 2A), we fine-tune state-of-the-art Arabic language models and use ChatGPT4-generated synthetic text for data augmentation. For image classification (Task 2B), we fine-tune ResNet18, EfficientFormerV2, and ConvNeXt-tiny architectures with DALL-E-2-generated synthetic images. For multimodal classification (Task 2C), we combine ConvNeXt-tiny and BERT architectures in a fusion layer to enhance binary classification. Our results show significant performance improvements with data augmentation for text and image classification models and with the fusion layer for multimodal classification. We highlight challenges and opportunities for future research in multimodal propaganda detection in Arabic content, emphasizing the need for robust and adaptable models to combat disinformation.

Co-authors

Venues

LREC1

Fix author