Ahlam Bashiti


2025

pdf bib
ImageEval 2025: The First Arabic Image Captioning Shared Task
Ahlam Bashiti | Alaa Aljabari | Hadi Khaled Hamoud | Md. Rafiul Biswas | Bilal Mohammed Shalash | Mustafa Jarrar | Fadi Zaraket | George Mikros | Ehsaneddin Asgari | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

We present ImageEval 2025, the first shared task dedicated to Arabic image captioning. The task addresses the critical gap in multimodal Arabic NLP by focusing on two complementary subtasks: (1) creating the first open-source, manually-captioned Arabic image dataset through a collaborative datathon, and (2) developing and evaluating Arabic image captioning models. A total of 44 teams registered, of which eight submitted during the test phase, producing 111 valid submissions. Evaluation was conducted using automatic metrics, LLM-based judgment, and human assessment. In Subtask 1, the best-performing system achieved a cosine similarity of 65.5, while in Subtask 2, the top score was 60.0. Although these results show encouraging progress, they also confirm that Arabic image captioning remains a challenging task, particularly due to cultural grounding requirements, morphological richness, and dialectal variation. All datasets, baseline models, and evaluation tools are released publicly to support future research in Arabic multimodal NLP.

pdf bib
Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
Fakhraddin Alwajih | Samar M. Magdy | Abdellah El Mekki | Omer Nacar | Youssef Nafea | Safaa Taher Abdelfadil | Abdulfattah Mohammed Yahya | Hamzah Luqman | Nada Almarwani | Samah Aloufi | Baraah Qawasmeh | Houdaifa Atou | Serry Sibaee | Hamzah A. Alsayadi | Walid Al-Dhabyani | Maged S. Al-shaibani | Aya El aatar | Nour Qandos | Rahaf Alhamouri | Samar Ahmad | Mohammed Anwar AL-Ghrawi | Aminetou Yacoub | Ruwa AbuHweidi | Vatimetou Mohamed Lemin | Reem Abdel-Salam | Ahlam Bashiti | Adel Ammar | Aisha Alansari | Ahmed Ashraf | Nora Alturayeif | Alcides Alcoba Inciarte | AbdelRahim A. Elmadany | Mohamedou Cheikh Tourad | Ismail Berrada | Mustafa Jarrar | Shady Shehata | Muhammad Abdul-Mageed
Findings of the Association for Computational Linguistics: EMNLP 2025

Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models’ cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.