Mustafa Jarrar - ACL Anthology

Mustafa Jarrar

2026

AbjadAuthorID: Authorship Identification for Arabic-Script Languages at AbjadNLP 2026
Shadi Abudalfa | Saad Ezzini | Ahmed Abdelali | Mustafa Jarrar | Mo El-Haj | Nadir Durrani | Hassan Sajjad | Farah Adeeba | Sina Ahmadi
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script

Authorship identification is a core problem in Natural Language Processing and computational linguistics, with applications spanning digital humanities, literary analysis, and forensic linguistics. While substantial progress has been made for English and other high-resource languages, authorship attribution for languages written in the Arabic (Abjad) script remains underexplored. In this paper, we present an overview of AbjadAuthorID, a shared task organised as part of the AbjadNLP workshop at EACL 2026, which focuses on multiclass authorship identification across Arabic-script languages. The shared task covers Modern Standard Arabic, Urdu, and Kurdish, and is formulated as a closed-set multiclass classification problem over literary text spanning multiple authors and historical periods. We describe the task motivation, dataset construction, evaluation protocol, and participation statistics, and report official results for the Arabic track. The findings highlight both the effectiveness of current approaches in controlled settings and the challenges posed by lower participation and resource availability in some language tracks. AbjadAuthorID establishes a new benchmark for multilingual authorship attribution in morphologically rich, underrepresented languages.

AbjadStyleTransfer: Authorship Style Transfer for Arabic-Script Languages at AbjadNLP 2026
Shadi Abudalfa | Saad Ezzini | Ahmed Abdelali | Mustafa Jarrar | Mo El-Haj | Nadir Durrani | Hassan Sajjad | Farah Adeeba
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script

Authorship style transfer aims to rewrite a given text so that it reflects the distinctive style of a target author while preserving the original meaning. Despite growing interest in text style transfer, most existing work has focused on English and other high-resource languages, with limited attention to languages written in the Arabic script. In this paper, we present an overview of AbjadStyleTransfer, a shared task organised as part of the AbjadNLP workshop at EACL 2026, which targets authorship style transfer for Arabic-script languages with a strong focus on literary text. The shared task covers Modern Standard Arabic and Urdu, and is designed to encourage research on controllable text generation in morphologically rich and stylistically diverse languages. Participants are required to generate text that conforms to the writing style of a specified author, given a semantically equivalent formal input. We describe the task motivation, dataset construction, evaluation protocol, and participation statistics, and provide an initial discussion of the challenges associated with authorship style transfer in Arabic-script languages. AbjadStyleTransfer establishes a new benchmark for literary style transfer beyond Latin-script settings and supports future research on culturally grounded and linguistically informed text generation.

AbjadGenEval: Abjad AI Generated Text Detection Shared Task for Languages Using Arabic Script at AbjadNLP 2026
Saad Ezzini | Irfan Ahmad | Salmane Chafik | Shadi Abudalfa | Mo El-Haj | Ahmed Abdelali | Mustafa Jarrar | Nadir Durrani | Hassan Sajjad | Farah Adeeba
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script

We present the findings of the AbjadGenEval shared task, organized as part of the AbjadNLP workshop at EACL 2026, which benchmarks AI-generated text detection for Arabic-script languages. Extending beyond Arabic to include Urdu, the task serves as a binary classification platform distinguishing human-written from AI-generated news articles produced by varied LLMs (e.g., GPT, Gemini). Twenty teams par- ticipated, with top systems achieving F1 scores of 0.93 for Arabic and 0.89 for Urdu. The re- sults highlight the dominance of multilingual transformers-specifically XLM-RoBERTa and DeBERTa-v3-and reveal significant challenges in cross-domain generalization, where naive data augmentation often yielded diminishing returns. This shared task establishes a robust baseline for authenticating content in the Abjad ecosystem.

Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Mo El-Haj | Paul Rayson | Mustafa Jarrar | Ignatius Ezeani | Saad Ezzini | Sina Ahmadi | Amal Haddad Haddad | Cynthia Amol | Ahmad Abdelali | Shadi Abudalfa
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script

2025

Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
Fakhraddin Alwajih | Samar M. Magdy | Abdellah El Mekki | Omer Nacar | Youssef Nafea | Safaa Taher Abdelfadil | Abdulfattah Mohammed Yahya | Hamzah Luqman | Nada Almarwani | Samah Aloufi | Baraah Qawasmeh | Houdaifa Atou | Serry Sibaee | Hamzah A. Alsayadi | Walid Al-Dhabyani | Maged S. Al-shaibani | Aya El aatar | Nour Qandos | Rahaf Alhamouri | Samar Ahmad | Mohammed Anwar AL-Ghrawi | Aminetou Yacoub | Ruwa AbuHweidi | Vatimetou Mohamed Lemin | Reem Abdel-Salam | Ahlam Bashiti | Adel Ammar | Aisha Alansari | Ahmed Ashraf | Nora Alturayeif | Alcides Alcoba Inciarte | AbdelRahim A. Elmadany | Mohamedou Cheikh Tourad | Ismail Berrada | Mustafa Jarrar | Shady Shehata | Muhammad Abdul-Mageed
Findings of the Association for Computational Linguistics: EMNLP 2025

Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models’ cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.

The AraGenEval Shared Task on Arabic Authorship Style Transfer and AI Generated Text Detection
Shadi Abudalfa | Saad Ezzini | Ahmed Abdelali | Hamza Alami | Abdessamad Benlahbib | Salmane Chafik | Mo El-Haj | Abdelkader El Mahdaouy | Mustafa Jarrar | Salima Lamsiyah | Hamzah Luqman
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

We present an overview of the AraGenEval shared task, organized as part of the ArabicNLP 2025 conference. This task introduced the first benchmark suite for Arabic authorship analysis, featuring three subtasks: Authorship Style Transfer, Authorship Identification, and AI-Generated Text Detection. We curated high-quality datasets, including over 47,000 paragraphs from 21 authors and a balanced corpus of human- and AI-generated texts. The task attracted significant global participation, with 72 registered teams from 16 countries. The results highlight the effectiveness of transformer-based models, with top systems leveraging prompt engineering for style transfer, model ensembling for authorship identification, and a mix of multilingual and Arabic-specific models for AI text detection. This paper details the task design, datasets, participant systems, and key findings, establishing a foundation for future research in Arabic stylistics and trustworthy NLP.

ImageEval 2025: The First Arabic Image Captioning Shared Task
Ahlam Bashiti | Alaa Aljabari | Hadi Khaled Hamoud | Md. Rafiul Biswas | Bilal Mohammed Shalash | Mustafa Jarrar | Fadi Zaraket | George Mikros | Ehsaneddin Asgari | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

We present ImageEval 2025, the first shared task dedicated to Arabic image captioning. The task addresses the critical gap in multimodal Arabic NLP by focusing on two complementary subtasks: (1) creating the first open-source, manually-captioned Arabic image dataset through a collaborative datathon, and (2) developing and evaluating Arabic image captioning models. A total of 44 teams registered, of which eight submitted during the test phase, producing 111 valid submissions. Evaluation was conducted using automatic metrics, LLM-based judgment, and human assessment. In Subtask 1, the best-performing system achieved a cosine similarity of 65.5, while in Subtask 2, the top score was 60.0. Although these results show encouraging progress, they also confirm that Arabic image captioning remains a challenging task, particularly due to cultural grounding requirements, morphological richness, and dialectal variation. All datasets, baseline models, and evaluation tools are released publicly to support future research in Arabic multimodal NLP.

Active Learning for Multidialectal Arabic POS Tagging
Diyam Akra | Mohammed Khalilia | Mustafa Jarrar
Findings of the Association for Computational Linguistics: EMNLP 2025

Multidialectal Arabic POS tagging is challenging due to the morphological richness and high variability among dialects. While POS tagging for MSA has advanced thanks to the availability of annotated datasets, creating similar resources for dialects remains costly and labor-intensive. Increasing the size of annotated datasets does not necessarily result in better performance. Active learning offers a more efficient alternative by prioritizing annotating the most informative samples. This paper proposes an active learning approach for multidialectal Arabic POS tagging. Our experiments revealed that annotating approximately 15,000 tokens is sufficient for high performance. We further demonstrate that using a fine-tuned model from one dialect to guide the selection of initial samples from another dialect accelerates convergence—reducing the annotation requirement by about 2,000 tokens. In conclusion, we propose an active learning pipeline and demonstrate that, upon reaching its defined stopping point of 16,000 annotated tokens, it achieves an accuracy of 97.6% on the Emirati Corpus.

Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4)
Saad Ezzini | Hamza Alami | Ismail Berrada | Abdessamad Benlahbib | Abdelkader El Mahdaouy | Salima Lamsiyah | Hatim Derrouz | Amal Haddad Haddad | Mustafa Jarrar | Mo El-Haj | Ruslan Mitkov | Paul Rayson
Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4)

Proceedings of the first International Workshop on Nakba Narratives as Language Resources
Mustafa Jarrar | Nizar Habash | Mo El-Haj | Amal Haddad Haddad | Zeina Jallad | Camille Mansour | Diana Allan | Paul Rayson | Tymaa Hammouda | Sanad Malaysha
Proceedings of the first International Workshop on Nakba Narratives as Language Resources

NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task
Bashar Talafha | Hawau Olamide Toyin | Peter Sullivan | AbdelRahim A. Elmadany | Abdurrahman Juma | Amirbek Djanibekov | Chiyu Zhang | Hamad Alshehhi | Hanan Aldarmaki | Mustafa Jarrar | Nizar Habash | Muhammad Abdul-Mageed
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

We present the findings of the sixth Nuanced Arabic Dialect Identification (NADI 2025) Shared Task, which focused on Arabic speech dialect processing across three subtasks: spoken dialect identification (Subtask 1), speech recognition (Subtask 2), and diacritic restoration for spoken dialects (Subtask 3). A total of 44 teams registered, and during the testing phase, 100 valid submissions were received from eight unique teams. The distribution was as follows: 34 submissions for Subtask 1 five teams, 47 submissions for Subtask 2 six teams, and 19 submissions for Subtask 3 two teams. The best-performing systems achieved 79.8% accuracy on Subtask 1, 35.68/12.20 WER/CER (overall average) on Subtask 2, and 55/13 WER/CER on Subtask 3. These results highlight the ongoing challenges of Arabic dialect speech processing, particularly in dialect identification, recognition, and diacritic restoration. We also summarize the methods adopted by participating teams and briefly outline directions for future editions of NADI.

Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs
Fakhraddin Alwajih | Abdellah El Mekki | Samar Mohamed Magdy | AbdelRahim A. Elmadany | Omer Nacar | El Moatez Billah Nagoudi | Reem Abdel-Salam | Hanin Atwany | Youssef Nafea | Abdulfattah Mohammed Yahya | Rahaf Alhamouri | Hamzah A. Alsayadi | Hiba Zayed | Sara Shatnawi | Serry Sibaee | Yasir Ech-chammakhy | Walid Al-Dhabyani | Marwa Mohamed Ali | Imen Jarraya | Ahmed Oumar El-Shangiti | Aisha Alraeesi | Mohammed Anwar AL-Ghrawi | Abdulrahman S. Al-Batati | Elgizouli Mohamed | Noha Taha Elgindi | Muhammed Saeed | Houdaifa Atou | Issam Ait Yahia | Abdelhak Bouayad | Mohammed Machrouh | Amal Makouar | Dania Alkawi | Mukhtar Mohamed | Safaa Taher Abdelfadil | Amine Ziad Ounnoughene | Anfel Rouabhia | Rwaa Assi | Ahmed Sorkatti | Mohamedou Cheikh Tourad | Anis Koubaa | Ismail Berrada | Mustafa Jarrar | Shady Shehata | Muhammad Abdul-Mageed
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce PALM, a year-long community-driven project covering all 22 Arab countries. The dataset contains instruction–response pairs in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics. Built by a team of 44 researchers across the Arab world—each an author of this paper—PALM offers a broad, inclusive perspective. We use PALM to evaluate the cultural and dialectal capabilities of several frontier LLMs, revealing notable limitations: while closed-source LLMs generally perform strongly, they still exhibit flaws, and smaller open-source models face greater challenges. Furthermore, certain countries (e.g., Egypt, the UAE) appear better represented than others (e.g., Iraq, Mauritania, Yemen). Our annotation guidelines, code, and data are publicly available for reproducibility. More information about PALM is available on our project page: https://github.com/UBC-NLP/palm.

Wojood^Relations: Arabic Relation Extraction Corpus and Modeling
Alaa Aljabari | Mohammed Khalilia | Mustafa Jarrar
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Relation extraction (RE) is a core task in natural language processing, crucial for semantic understanding, knowledge graph construction, and enhancing downstream applications. Existing work on Arabic RE remains limited due to the language’s rich morphology and syntactic complexity, and the lack of large, high-quality datasets. In this paper, we present Wojood^Relations, the largest and most diverse Arabic RE corpus to date, containing over 33K sentences (∼550K tokens) annotated with ∼15K relation triples across 40 relation types. The corpus is built on top of Wojood NER dataset with manual relation annotations carried out by expert annotators, achieving a Cohen’s 𝜅 of 0.92, indicating high reliability. In addition, we propose two methods: NLI-RE, which formulates RE as a binary natural language inference problem using relation-aware templates, and GPT-Joint, a few-shot LLM framework for joint entity and RE via relation-aware retrieval. Finally, we benchmark the dataset using both supervised models and in-context learning with LLMs. Supervised models achieve 92.89% F1 for RE, while LLMs obtain 72.73% F1 for joint entity and RE. These results establish strong baselines, highlight key challenges, and provide a foundation for advancing Arabic RE research.

WojoodOntology: Ontology-Driven LLM Prompting for Unified Information Extraction Tasks
Alaa Aljabari | Nagham Hamad | Mohammed Khalilia | Mustafa Jarrar
Proceedings of The Third Arabic Natural Language Processing Conference

Information Extraction tasks such as Named Entity Recognition and Relation Extraction are often developed using diverse tagsets and annotation guidelines. This presents major challenges for model generalization, cross-dataset evaluation, tool interoperability, and broader industry adoption. To address these issues, we propose an information extraction ontology, , which covers a wide range of named entity types and relations. serves as a semantic mediation framework that facilitates alignment across heterogeneous tagsets and annotation guidelines. We propose two ontology-based mapping methods: (i) as a set of mapping rules for uni-directional tagset alignment; and (ii) as ontology-based prompting, which incorporates the ontology concepts directly into prompts, enabling large language models (LLMs) to perform more effective and bi-directional mappings. Our experiments show a 15% improvement in out-of-domain mapping accuracy when using ontology-based prompting compared to rule-based methods. Furthermore, is aligned with Schema.org and Wikidata, enabling interoperability with knowledge graphs and facilitating broader industry adoption. The is open source and available at https://sina.birzeit.edu/wojood.

Konooz: Multi-domain Multi-dialect Corpus for Named Entity Recognition
Nagham Hamad | Mohammed Khalilia | Mustafa Jarrar
Findings of the Association for Computational Linguistics: ACL 2025

We introduce , a novel multi-dimensional corpus covering 16 Arabic dialects across 10 domains, resulting in 160 distinct corpora. The corpus comprises about 777k tokens, carefully collected and manually annotated with 21 entity types using both nested and flat annotation schemes - using the Wojood guidelines. While is useful for various NLP tasks like domain adaptation and transfer learning, this paper primarily focuses on benchmarking existing Arabic Named Entity Recognition (NER) models, especially cross-domain and cross-dialect model performance. Our benchmarking of four Arabic NER models using reveals a significant drop in performance of up to 38% when compared to the in-distribution data. Furthermore, we present an in-depth analysis of domain and dialect divergence and the impact of resource scarcity. We also measured the overlap between domains and dialects using the Maximum Mean Discrepancy (MMD) metric, and illustrated why certain NER models perform better on specific dialects and domains. is open-source and publicly available at https://sina.birzeit.edu/wojood/#download

2024

WojoodNER 2024: The Second Arabic Named Entity Recognition Shared Task
Mustafa Jarrar | Nagham Hamad | Mohammed Khalilia | Bashar Talafha | AbdelRahim Elmadany | Muhammad Abdul-Mageed
Proceedings of the Second Arabic Natural Language Processing Conference

We present WojoodNER-2024, the second Arabic Named Entity Recognition (NER) Shared Task. In WojoodNER-2024, we focus on fine-grained Arabic NER. We provided participants with a new Arabic fine-grained NER dataset called Wojoodfine, annotated with subtypes of entities. WojoodNER-2024 encompassed three subtasks: (i) Closed-Track Flat Fine-Grained NER, (ii) Closed-Track Nested Fine-Grained NER, and (iii) an Open-Track NER for the Israeli War on Gaza. A total of 43 unique teams registered for this shared task. Five teams participated in the Flat Fine-Grained Subtask, among which two teams tackled the Nested Fine-Grained Subtask and one team participated in the Open-Track NER Subtask. The winning teams achieved F₁ scores of 91% and 92% in the Flat Fine-Grained and Nested Fine-Grained Subtasks, respectively. The sole team in the Open-Track Subtask achieved an F₁ score of 73.7%.

NLU-STR at SemEval-2024 Task 1: Generative-based Augmentation and Encoder-based Scoring for Semantic Textual Relatedness
Sanad Malaysha | Mustafa Jarrar | Mohammed Khalilia
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Semantic textual relatedness is a broader concept of semantic similarity. It measures the extent to which two chunks of text convey similar meaning or topics, or share related concepts or contexts. This notion of relatedness can be applied in various applications, such as document clustering and summarizing. SemRel-2024, a shared task in SemEval-2024, aims at reducing the gap in the semantic relatedness task by providing datasets for fourteen languages and dialects including Arabic. This paper reports on our participation in Track A (Algerian and Moroccan dialects) and Track B (Modern Standard Arabic). A BERT-based model is augmented and fine-tuned for regression scoring in supervised track (A), while BERT-based cosine similarity is employed for unsupervised track (B). Our system ranked 1st in SemRel-2024 for MSA with a Spearman correlation score of 0.49. We ranked 5th for Moroccan and 12th for Algerian with scores of 0.83 and 0.53, respectively.

ArabicNLU 2024: The First Arabic Natural Language Understanding Shared Task
Mohammed Khalilia | Sanad Malaysha | Reem Suwaileh | Mustafa Jarrar | Alaa Aljabari | Tamer Elsayed | Imed Zitouni
Proceedings of the Second Arabic Natural Language Processing Conference

This paper presents an overview of the Arabic Natural Language Understanding (ArabicNLU 2024) shared task, focusing on two subtasks: Word Sense Disambiguation (WSD) and Location Mention Disambiguation (LMD). The task aimed to evaluate the ability of automated systems to resolve word ambiguity and identify locations mentioned in Arabic text. We provided participants with novel datasets, including a sense-annotated corpus for WSD, called SALMA with approximately 34k annotated tokens, and the dataset with 3,893 annotations and 763 unique location mentions. These are challenging tasks. Out of the 38 registered teams, only three teams participated in the final evaluation phase, with the highest accuracy being 77.8% for WSD and 95.0% for LMD. The shared task not only facilitated the evaluation and comparison of different techniques, but also provided valuable insights and resources for the continued advancement of Arabic NLU technologies.

AraFinNLP 2024: The First Arabic Financial NLP Shared Task
Sanad Malaysha | Mo El-Haj | Saad Ezzini | Mohammed Khalilia | Mustafa Jarrar | Sultan Almujaiwel | Ismail Berrada | Houda Bouamor
Proceedings of the Second Arabic Natural Language Processing Conference

The expanding financial markets of the Arab world require sophisticated Arabic NLP tools. To address this need within the banking domain, the Arabic Financial NLP (AraFinNLP) shared task proposes two subtasks: (i) Multi-dialect Intent Detection and (ii) Cross-dialect Translation and Intent Preservation. This shared task uses the updated ArBanking77 dataset, which includes about 39k parallel queries in MSA and four dialects. Each query is labeled with one or more of a common 77 intents in the banking domain. These resources aim to foster the development of robust financial Arabic NLP, particularly in the areas of machine translation and banking chat-bots.A total of 45 unique teams registered for this shared task, with 11 of them actively participated in the test phase. Specifically, 11 teams participated in Subtask 1, while only 1 team participated in Subtask 2. The winning team of Subtask 1 achieved F1 score of 0.8773, and the only team submitted in Subtask 2 achieved a 1.667 BLEU score.

The FIGNEWS Shared Task on News Media Narratives
Wajdi Zaghouani | Mustafa Jarrar | Nizar Habash | Houda Bouamor | Imed Zitouni | Mona Diab | Samhaa El-Beltagy | Muhammed AbuOdeh
Proceedings of the Second Arabic Natural Language Processing Conference

We present an overview of the FIGNEWSshared task, organized as part of the Arabic-NLP 2024 conference co-located with ACL2024. The shared task addresses bias and pro-paganda annotation in multilingual news posts.We focus on the early days of the Israel War onGaza as a case study. The task aims to fostercollaboration in developing annotation guide-lines for subjective tasks by creating frame-works for analyzing diverse narratives high-lighting potential bias and propaganda. In aspirit of fostering and encouraging diversity,we address the problem from a multilingualperspective, namely within five languages: En-glish, French, Arabic, Hebrew, and Hindi. Atotal of 17 teams participated in two annota-tion subtasks: bias (16 teams) and propaganda(6 teams). The teams competed in four evalua-tion tracks: guidelines development, annotationquality, annotation quantity, and consistency.Collectively, the teams produced 129,800 datapoints. Key findings and implications for thefield are discussed.

Event-Arguments Extraction Corpus and Modeling using BERT for Arabic
Alaa Aljabari | Lina Duaibes | Mustafa Jarrar | Mohammed Khalilia
Proceedings of the Second Arabic Natural Language Processing Conference

Event-argument extraction is a challenging task, particularly in Arabic due to sparse linguistic resources. To fill this gap, we introduce the corpus (550k tokens) as an extension of Wojood, enriched with event-argument annotations. We used three types of event arguments: agent, location, and date, which we annotated as relation types. Our inter-annotator agreement evaluation resulted in 82.23% Kappa score and 87.2% F₁-score. Additionally, we propose a novel method for event relation extraction using BERT, in which we treat the task as text entailment. This method achieves an F₁-score of 94.01%.To further evaluate the generalization of our proposed method, we collected and annotated another out-of-domain corpus (about 80k tokens) called and used it as a second test set, on which our approach achieved promising results (83.59% F₁-score). Last but not least, we propose an end-to-end system for event-arguments extraction. This system is implemented as part of SinaTools, and both corpora are publicly available at https://sina.birzeit.edu/wojood

Qabas: An Open-Source Arabic Lexicographic Database
Mustafa Jarrar | Tymaa Hasanain Hammouda
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present Qabas, a novel open-source Arabic lexicon designed for NLP applications. The novelty of Qabas lies in its synthesis of 110 lexicons. Specifically, Qabas lexical entries (lemmas) are assembled by linking lemmas from 110 lexicons. Furthermore, Qabas lemmas are also linked to 12 morphologically annotated corpora (about 2M tokens), making it the first Arabic lexicon to be linked to lexicons and corpora. Qabas was developed semi-automatically, utilizing a mapping framework and a web-based tool. Compared with other lexicons, Qabas stands as the most extensive Arabic lexicon, encompassing about 58K lemmas (45K nominal lemmas, 12.5K verbal lemmas, and 473 functional-word lemmas). Qabas is open-source and accessible online at https://sina.birzeit.edu/qabas

In spite of the recent progress in speech processing, the majority of world languages and dialects remain uncovered. This situation only furthers an already wide technological divide, thereby hindering technological and socioeconomic inclusion. This challenge is largely due to the absence of datasets that can empower diverse speech systems. In this paper, we seek to mitigate this obstacle for a number of Arabic dialects by presenting Casablanca, a large-scale community-driven effort to collect and transcribe a multi-dialectal Arabic dataset. The dataset covers eight dialects: Algerian, Egyptian, Emirati, Jordanian, Mauritanian, Moroccan, Palestinian, and Yemeni, and includes annotations for transcription, gender, dialect, and code-switching. We also develop a number of strong baselines exploiting Casablanca. The project page for Casablanca is accessible at: www.dlnlp.ai/speech/casablanca.

Sina at FigNews 2024: Multilingual Datasets Annotated with Bias and Propaganda.
Lina Duaibes | Areej Jaber | Mustafa Jarrar | Ahmad Qadi | Mais Qandeel
Proceedings of the Second Arabic Natural Language Processing Conference

The proliferation of bias and propaganda onsocial media is an increasingly significant concern,leading to the development of techniquesfor automatic detection. This article presents amultilingual corpus of 12, 000 Facebook postsfully annotated for bias and propaganda. Thecorpus was created as part of the FigNews2024 Shared Task on News Media Narrativesfor framing the Israeli War on Gaza. It coversvarious events during the War from October7, 2023 to January 31, 2024. The corpuscomprises 12, 000 posts in five languages (Arabic,Hebrew, English, French, and Hindi), with2, 400 posts for each language. The annotationprocess involved 10 graduate students specializingin Law. The Inter-Annotator Agreement(IAA) was used to evaluate the annotationsof the corpus, with an average IAA of 80.8%for bias and 70.15% for propaganda annotations.Our team was ranked among the bestperformingteams in both Bias and Propagandasubtasks. The corpus is open-source and availableat https://sina.birzeit.edu/fada

2023

Arabic Fine-Grained Entity Recognition
Haneen Abdallatif Liqreina | Mustafa Jarrar | Mohammed Khalilia | Ahmed Oumar El-Shangiti | Muhammad Abdul-Mageed
Proceedings of ArabicNLP 2023

Traditional NER systems are typically trained to recognize coarse-grained categories of entities, and less attention is given to classifying entities into a hierarchy of fine-grained lower-level sub-types. This article aims to advance Arabic NER with fine-grained entities. We chose to extend Wojood (an open-source Nested Arabic Named Entity Corpus) with sub-types. In particular, four main entity types in Wojood (geopolitical entity (GPE), location (LOC), organization (ORG), and facility (FAC) are extended with 31 sub-types of entities. To do this, we first revised Wojood’s annotations of GPE, LOC, ORG, and FAC to be compatible with the LDC’s ACE guidelines, which yielded 5, 614 changes. Second, all mentions of GPE, LOC, ORG, and FAC (~ 44K) in Wojood are manually annotated with the LDC’s ACE subtypes. This extended version of Wojood is called WojoodFine. To evaluate our annotations, we measured the inter-annotator agreement (IAA) using both Cohen’s Kappa and F1 score, resulting in 0.9861 and 0.9889, respectively. To compute the baselines of WojoodFine, we fine-tune three pre-trained Arabic BERT encoders in three settings: flat NER, nested NER and nested NER with sub-types and achieved F1 score of 0.920, 0.866, and 0.885, respectively. Our corpus and models are open source and available at https://sina.birzeit.edu/wojood/.

ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic
Mustafa Jarrar | Ahmet Birim | Mohammed Khalilia | Mustafa Erden | Sana Ghanem
Proceedings of ArabicNLP 2023

This paper presents the ArBanking77, a large Arabic dataset for intent detection in the banking domain. Our dataset was arabized and localized from the original English Banking77 dataset, which consists of 13,083 queries to ArBanking77 dataset with 31,404 queries in both Modern Standard Arabic (MSA) and Palestinian dialect, with each query classified into one of the 77 classes (intents). Furthermore, we present a neural model, based on AraBERT, fine-tuned on ArBanking77, which achieved an F1-score of 0.9209 and 0.8995 on MSA and Palestinian dialect, respectively. We performed extensive experimentation in which we simulated low-resource settings, where the model is trained on a subset of the data and augmented with noisy queries to simulate colloquial terms, mistakes and misspellings found in real NLP systems, especially live chat queries. The data and the models are publicly available at https://sina.birzeit.edu/arbanking77.

A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms
Sana Ghanem | Mustafa Jarrar | Radi Jarrar | Ibrahim Bounhas
Proceedings of the 12th Global Wordnet Conference

This paper addresses the task of extending a given synset with additional synonyms taking into account synonymy strength as a fuzzy value. Given a mono/multilingual synset and a threshold (a fuzzy value [0−1]), our goal is to extract new synonyms above this threshold from existing lexicons. We present twofold contributions: an algorithm and a benchmark dataset. The dataset consists of 3K candidate synonyms for 500 synsets. Each candidate synonym is annotated with a fuzzy value by four linguists. The dataset is important for (i) understanding how much linguists (dis/)agree on synonymy, in addition to (ii) using the dataset as a baseline to evaluate our algorithm. Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate. Our evaluations show that the algorithm behaves like a linguist and its fuzzy values are close to those proposed by linguists (using RMSE and MAE). The dataset and a demo page are publicly available at https://portal.sina.birzeit.edu/synonyms.

Context-Gloss Augmentation for Improving Arabic Target Sense Verification
Sanad Malaysha | Mustafa Jarrar | Mohammed Khalilia
Proceedings of the 12th Global Wordnet Conference

Arabic language lacks semantic datasets and sense inventories. The most common semantically-labeled dataset for Arabic is the ArabGlossBERT, a relatively small dataset that consists of 167K context-gloss pairs (about 60K positive and 107K negative pairs), collected from Arabic dictionaries. This paper presents an enrichment to the ArabGlossBERT dataset, by augmenting it using (Arabic-English-Arabic) machine back-translation. Augmentation increased the dataset size to 352K pairs (149K positive and 203K negative pairs). We measure the impact of augmentation using different data configurations to fine-tune BERT on target sense verification (TSV) task. Overall, the accuracy ranges between 78% to 84% for different data configurations. Although our approach performed at par with the baseline, we did observe some improvements for some POS tags in some experiments. Furthermore, our fine-tuned models are trained on a larger dataset covering larger vocabulary and contexts. We provide an in-depth analysis of the accuracy for each part-of-speech (POS).

SALMA: Arabic Sense-Annotated Corpus and WSD Benchmarks
Mustafa Jarrar | Sanad Malaysha | Tymaa Hammouda | Mohammed Khalilia
Proceedings of ArabicNLP 2023

SALMA, the first Arabic sense-annotated corpus, consists of ~34K tokens, which are all sense-annotated. The corpus is annotated using two different sense inventories simultaneously (Modern and Ghani). SALMA novelty lies in how tokens and senses are associated. Instead of linking a token to only one intended sense, SALMA links a token to multiple senses and provides a score to each sense. A smart web-based annotation tool was developed to support scoring multiple senses against a given word. In addition to sense annotations, we also annotated the corpus using six types of named entities. The quality of our annotations was assessed using various metrics (Kappa, Linear Weighted Kappa, Quadratic Weighted Kappa, Mean Average Error, and Root Mean Square Error), which show very high inter-annotator agreement. To establish a Word Sense Disambiguation baseline using our SALMA corpus, we developed an end-to-end Word Sense Disambiguation system using Target Sense Verification. We used this system to evaluate three Target Sense Verification models available in the literature. Our best model achieved an accuracy with 84.2% using Modern and 78.7% using Ghani. The full corpus and the annotation tool are open-source and publicly available at https://sina.birzeit.edu/salma/.

Nâbra: Syrian Arabic Dialects with Morphological Annotations
Amal Nayouf | Tymaa Hammouda | Mustafa Jarrar | Fadi Zaraket | Mohamad-Bassam Kurdy
Proceedings of ArabicNLP 2023

This paper presents Nâbra (نَبْرَة), a corpora of Syrian Arabic dialects with morphological annotations. A team of Syrian natives collected more than 6K sentences containing about 60K words from several sources including social media posts, scripts of movies and series, lyrics of songs and local proverbs to build Nâbra. Nâbra covers several local Syrian dialects including those of Aleppo, Damascus, Deir-ezzur, Hama, Homs, Huran, Latakia, Mardin, Raqqah, and Suwayda. A team of nine annotators annotated the 60K tokens with full morphological annotations across sentence contexts. We trained the annotators to follow methodological annotation guidelines to ensure unique morpheme annotations, and normalized the annotations. F1 and 𝜅 agreement scores ranged between 74% and 98% across features, showing the excellent quality of Nâbra annotations. Our corpora are open-source and publicly available as part of the Currasat portal https://sina.birzeit.edu/currasat.

WojoodNER 2023: The First Arabic Named Entity Recognition Shared Task
Mustafa Jarrar | Muhammad Abdul-Mageed | Mohammed Khalilia | Bashar Talafha | AbdelRahim Elmadany | Nagham Hamad | Alaa’ Omar
Proceedings of ArabicNLP 2023

We present WojoodNER-2023, the first Arabic Named Entity Recognition (NER) Shared Task. The primary focus of WojoodNER 2023 is on Arabic NER, offering a novel NER datasets (i.e., Wojood) and the definition of subtasks designed to facilitate meaningful comparisons between different NER approaches. WojoodNER-2023 encompassed two Subtasks: FlatNER and NestedNER. A total of 45 unique teams registered for this shared task, with 11 of them actively participating in the test phase. Specifically, 11 teams participated in FlatNER, while 8 teams tackled NestedNER. The winning team achieved F1 score of 91.96 and 93.73 in FlatNER and NestedNER respectively.

Open-Source Thesaurus Development for Under-Resourced Languages: a Welsh Case Study
Nouran Khallaf | Elin Arfon | Mo El-Haj | Jonathan Morris | Dawn Knight | Paul Rayson | Tymaa Hasanain Hammouda | Mustafa Jarrar
Proceedings of the 4th Conference on Language, Data and Knowledge

2022

Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT
Mustafa Jarrar | Mohammed Khalilia | Sana Ghanem
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER). Nested entities occur when one entity mention is embedded inside another entity mention. Wojood consists of about 550K Modern Standard Arabic (MSA) and dialect tokens that are manually annotated with 21 entity types including person, organization, location, event and date. More importantly, the corpus is annotated with nested entities instead of the more common flat annotations. The data contains about 75K entities and 22.5% of which are nested. The inter-annotator evaluation of the corpus demonstrated a strong agreement with Cohen’s Kappa of 0.979 and an F1-score of 0.976. To validate our data, we used the corpus to train a nested NER model based on multi-task learning using the pre-trained AraBERT (Arabic BERT). The model achieved an overall micro F1-score of 0.884. Our corpus, the annotation guidelines, the source code and the pre-trained model are publicly available.

Curras + Baladi: Towards a Levantine Corpus
Karim Al-Haff | Mustafa Jarrar | Tymaa Hammouda | Fadi Zaraket
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents two-fold contributions: a full revision of the Palestinian morphologically annotated corpus (Curras), and a newly annotated Lebanese corpus (Baladi). Both corpora can be used as a more general Levantine corpus. Baladi consists of around 9.6K morphologically annotated tokens. Each token was manually annotated with several morphological features and using LDC’s SAMA lemmas and tags. The inter-annotator evaluation on most features illustrates 78.5% Kappa and 90.1% F1-Score. Curras was revised by refining all annotations for accuracy, normalization and unification of POS tags, and linking with SAMA lemmas. This revision was also important to ensure that both corpora are compatible and can help to bridge the nuanced linguistic gaps that exist between the two highly mutually intelligible dialects. Both corpora are publicly available through a web portal.

2021

ArabGlossBERT: Fine-Tuning BERT on Context-Gloss Pairs for WSD
Moustafa Al-Hajj | Mustafa Jarrar
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Using pre-trained transformer models such as BERT has proven to be effective in many NLP tasks. This paper presents our work to fine-tune BERT models for Arabic Word Sense Disambiguation (WSD). We treated the WSD task as a sentence-pair binary classification task. First, we constructed a dataset of labeled Arabic context-gloss pairs (~167k pairs) we extracted from the Arabic Ontology and the large lexicographic database available at Birzeit University. Each pair was labeled as True or False and target words in each context were identified and annotated. Second, we used this dataset for fine-tuning three pre-trained Arabic BERT models. Third, we experimented the use of different supervised signals used to emphasize target words in context. Our experiments achieved promising results (accuracy of 84%) although we used a large set of senses in the experiment.

Extracting Synonyms from Bilingual Dictionaries
Mustafa Jarrar | Eman Naser | Muhammad Khalifa | Khaled Shaalan
Proceedings of the 11th Global Wordnet Conference

We present our progress in developing a novel algorithm to extract synonyms from bilingual dictionaries. Identification and usage of synonyms play a significant role in improving the performance of information access applications. The idea is to construct a translation graph from translation pairs, then to extract and consolidate cyclic paths to form bilingual sets of synonyms. The initial evaluation of this algorithm illustrates promising results in extracting Arabic-English bilingual synonyms. In the evaluation, we first converted the synsets in the Arabic WordNet into translation pairs (i.e., losing word-sense memberships). Next, we applied our algorithm to rebuild these synsets. We compared the original and extracted synsets obtaining an F-Measure of 82.3% and 82.1% for Arabic and English synsets extraction, respectively.

LU-BZU at SemEval-2021 Task 2: Word2Vec and Lemma2Vec performance in Arabic Word-in-Context disambiguation
Moustafa Al-Hajj | Mustafa Jarrar
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents a set of experiments to evaluate and compare between the performance of using CBOW Word2Vec and Lemma2Vec models for Arabic Word-in-Context (WiC) disambiguation without using sense inventories or sense embeddings. As part of the SemEval-2021 Shared Task 2 on WiC disambiguation, we used the dev.ar-ar dataset (2k sentence pairs) to decide whether two words in a given sentence pair carry the same meaning. We used two Word2Vec models: Wiki-CBOW, a pre-trained model on Arabic Wikipedia, and another model we trained on large Arabic corpora of about 3 billion tokens. Two Lemma2Vec models was also constructed based on the two Word2Vec models. Each of the four models was then used in the WiC disambiguation task, and then evaluated on the SemEval-2021 test.ar-ar dataset. At the end, we reported the performance of different models and compared between using lemma-based and word-based models.

2014

Building a Corpus for Palestinian Arabic: a Preliminary Study
Mustafa Jarrar | Nizar Habash | Diyam Akra | Nasser Zalmout
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

Towards Building Lexical Ontology via Cross-Language Matching
Mamoun Abu Helou | Matteo Palmonari | Mustafa Jarrar | Christiane Fellbaum
Proceedings of the Seventh Global Wordnet Conference

Co-authors

Shadi Abudalfa 5

Alaa Aljabari 5

Ismail Berrada 5

Ahmed Abdelali 4

Tymaa Hammouda 4

Bashar Talafha 4

Walid Al-Dhabyani 3

Rahaf Alhamouri 3

Hamzah A. Alsayadi 3

Fakhraddin Alwajih 3

Nadir Durrani 3

Abdellah El Mekki 3

Ahmed Oumar El-Shangiti 3

AbdelRahim A. Elmadany 3

Amal Haddad Haddad 3

Samar Mohamed Magdy 3

Hassan Sajjad 3

Shady Shehata 3

Mohamedou Cheikh Tourad 3

Fadi A. Zaraket 3

Mohammed Anwar AL-Ghrawi 2

Reem Abdel-Salam 2

Safaa Taher Abdelfadil 2

Moustafa Al-Hajj 2

Aisha Alraeesi 2

Houdaifa Atou 2

Ahlam Bashiti 2

Abdessamad Benlahbib 2

Houda Bouamor 2

Salmane Chafik 2

Yasir Ech-chammakhy 2

Abdelkader El Mahdaouy 2

Abdelrahim Elmadany 2

Tymaa Hasanain Hammouda 2

Salima Lamsiyah 2

Hamzah Luqman 2

Youssef Nafea 2

El-Moatez-Billah Nagoudi 2

Sara Shatnawi 2

Abdulfattah Mohammed Yahya 2

Wajdi Zaghouani 2

Ahmad Abdelali 1

Ruwa AbuHweidi 1

Muhammed AbuOdeh 1

Abdulrahman S. Al-Batati 1

Karim Al-Haff 1

Maged S. Al-shaibani 1

Aisha Alansari 1

Alcides Alcoba Inciarte 1

Hanan Aldarmaki 1

Marwa Mohamed Ali 1

Nada Almarwani 1

Sultan Almujaiwel 1

Hamad Alshehhi 1

Nora Alturayeif 1

Ehsaneddin Asgari 1

Yousra Berrachedi 1

Md. Rafiul Biswas 1

Abdelhak Bouayad 1

Ibrahim Bounhas 1

Chafei Mohamed Chafei 1

Hatim Derrouz 1

Amirbek Djanibekov 1

Samhaa R. El-Beltagy 1

Noha Taha Elgindi 1

Tamer Elsayed 1

Mustafa Erden 1

Ignatius Ezeani 1

Christiane Fellbaum 1

Mariem Habiboullah 1

Hadi Khaled Hamoud 1

Mamoun Abu Helou 1

Abdurrahman Juma 1

Karima Kadaoui 1

Muhammad Khalifa 1

Nouran Khallaf 1

Mohamad-Bassam Kurdy 1

Vatimetou Mohamed Lemin 1

Haneen Abdallatif Liqreina 1

Mohammed Machrouh 1

Camille Mansour 1

George Mikros 1

Ruslan Mitkov 1

Elgizouli Mohamed 1

Mukhtar Mohamed 1

Abdelrahman Mohamed 1

Jonathan Morris 1

Amine Ziad Ounnoughene 1

Matteo Palmonari 1

Baraah Qawasmeh 1

Anfel Rouabhia 1

Benelhadj Djelloul Mama Saadia 1

Muhammed Saeed 1

Khaled Shaalan 1

Bilal Mohammed Shalash 1

Ahmed Sorkatti 1

Peter Sullivan 1

Reem Suwaileh 1

Hawau Olamide Toyin 1

Aminetou Yacoub 1

Issam Ait Yahia 1

Nasser Zalmout 1

Venues