Tamer Elsayed - ACL Anthology

Tamer Elsayed

2026

MAPLE: A Meta-learning Framework for Cross-Prompt Essay Scoring
Salam Albatarni | May Bashendy | Sohaila Eltanbouly | Tamer Elsayed
Findings of the Association for Computational Linguistics: ACL 2026

Automated Essay Scoring (AES) faces significant challenges in cross-prompt settings, where models must generalize to unseen writing prompts. To address this limitation, we propose MAPLE, a meta-learning framework that leverages prototypical networks to learn transferable representations across different writing prompts. Across three diverse datasets (ELLIPSE and ASAP (English), and LAILA (Arabic)), MAPLE achieves state-of-the-art performance on ELLIPSE and LAILA, outperforming strong baselines by 8.5 and 3 points in QWK, respectively. On ASAP, where prompts exhibit heterogeneous score ranges, MAPLE yields improvements on several traits, highlighting the strengths of our approach in unified scoring settings. Overall, our results demonstrate the potential of meta-learning for building robust cross-prompt AES systems.

LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring
May Bashendy | Walid Massoud | Sohaila Eltanbouly | Salam Albatarni | Marwan Sayed | Abrar Abir | Houda Bouamor | Tamer Elsayed
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Automated Essay Scoring (AES) has gained increasing attention in recent years, yet research on Arabic AES remains limited due to the lack of publicly available datasets. To address this, we introduce LAILA, the largest publicly available Arabic AES dataset to date, comprising 7,859 essays annotated with holistic and trait-specific scores on seven dimensions: relevance, organization, vocabulary, style, development, mechanics, and grammar. We detail the dataset design, collection, and annotations, and provide benchmark results using state-of-the-art Arabic and English models in prompt-specific and cross-prompt settings. LAILA fills a critical need in Arabic AES research, supporting the development of robust scoring systems.

Qayyem: A Real-time Platform for Scoring Proficiency of Arabic Essays
Hoor Tamer Elbahnasawi | Marwan Sayed | Sohaila Eltanbouly | Fatima Zahra Brahamia | Tamer Elsayed
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Over the past years, Automated Essay Scoring (AES) systems have gained increasing attention as scalable and consistent solutions for assessing proficiency of student writing. Despite recent progress, support for Arabic AES remains limited due to linguistic complexity and scarcity of large publicly-available annotated datasets. In this work, we present Qayyem, a Web-based platform designed to support Arabic AES by providing an integrated workflow for assignment creation, batch essay upload, scoring configuration, and per-trait essay evaluation. Qayyem abstracts the technical complexity of interacting with scoring server APIs, allowing instructors to access advanced scoring services through a user-friendly interface. The platform deploys a number of state-of-the-art Arabic essay scoring models with different effectiveness and efficiency figures.

2025

TRATES: Trait-Specific Rubric-Assisted Cross-Prompt Essay Scoring
Sohaila Eltanbouly | Salam Albatarni | Tamer Elsayed
Findings of the Association for Computational Linguistics: ACL 2025

Research on holistic Automated Essay Scoring (AES) is long-dated; yet, there is a notable lack of attention for assessing essays according to individual traits. In this work, we propose TRATES, a novel trait-specific and rubric-based cross-prompt AES framework that is generic yet specific to the underlying trait. The framework leverages a Large Language Model (LLM) that utilizes the trait grading rubrics to generate trait-specific features (represented by assessment questions), then assesses those features given an essay. The trait-specific features are eventually combined with generic writing-quality and prompt-specific features to train a simple classical regression model that predicts trait scores of essays from an unseen prompt. Experiments show that TRATES achieves a new state-of-the-art performance across all traits on a widely-used dataset, with the generated LLM-based features being the most significant.

IslamicEval 2025: The First Shared Task of Capturing LLMs Hallucination in Islamic Content
Hamdy Mubarak | Rana Malhas | Watheq Mansour | Abubakr Mohamed | Mahmoud Fawzi | Majd Hawasly | Tamer Elsayed | Kareem Mohamed Darwish | Walid Magdy
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

Hallucination in Large Language Models (LLMs) remains a significant challenge and continues to draw substantial research attention. The problem becomes especially critical when hallucinations arise in sensitive domains, such as religious discourse. To address this gap, we introduce IslamicEval 2025—the first shared task specifically focused on evaluating and detecting hallucinations in Islamic content. The task consists of two subtasks: (1) Hallucination Detection and Correction of quoted verses (Ayahs) from the Holy Quran and quoted Hadiths; and (2) Qur’an and Hadith Question Answering, which assesses retrieval models and LLMs by requiring answers to be retrieved from grounded, authoritative sources. Thirteen teams participated in the final phase of the shared task, employing a range of pipelines and frameworks. Their diverse approaches underscore both the complexity of the task and the importance of effectively managing hallucinations in Islamic discourse.

TAQEEM 2025: Overview of The First Shared Task for Arabic Quality Evaluation of Essays in Multi-dimensions
May Bashendy | Salam Albatarni | Sohaila Eltanbouly | Walid Massoud | Houda Bouamor | Tamer Elsayed
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

Automated Essay Scoring (AES) has emerged as a significant research problem in natural language processing, offering valuable tools to support educators in assessing student writing. Motivated by the growing need for reliable Arabic AES systems, we organized the first shared Task for Arabic Quality Evaluation of Essays in Multi-dimensions (TAQEEM) held at the ArabicNLP 2025 conference. TAQEEM 2025 includes two subtasks: Task A on holistic scoring and Task B on trait-specific scoring. It introduces a new (and first of its kind) dataset of 1,265 Arabic essays, annotated with holistic and trait-specific scores, including relevance, organization, vocabulary, style, development, mechanics, and grammar. The main goal of TAQEEM is to address the scarcity of standardized benchmarks and high-quality resources in Arabic AES. TAQEEM 2025 attracted 11 registered teams for Task A and 10 for Task B, with a total of 5 teams, across both tasks, submitting system runs for evaluation. This paper presents an overview of the task, outlines the approaches employed, and discusses the results of the participating teams.

BALSAM: A Platform for Benchmarking Arabic Large Language Models
Rawan Al-Matham | Kareem Darwish | Raghad Al-Rasheed | Waad Alshammari | Muneera Alhoshan | Amal Almazrua | Asma Al Wazrah | Mais Alheraki | Firoj Alam | Preslav Nakov | Norah Alzahrani | Eman AlBilali | Nizar Habash | Abdelrahman El-Sheikh | Muhammad Elmallah | Haonan Li | Hamdy Mubarak | Mohamed Anwar | Zaid Alyafeai | Ahmed Abdelali | Nora Altwairesh | Maram Hasanain | Abdulmohsen Al Thubaity | Shady Shehata | Bashar Alhafni | Injy Hamed | Go Inoue | Khalid Elmadani | Ossama Obeid | Fatima Haouari | Tamer Elsayed | Emad Alghamdi | Khalid Almubarak | Saied Alshahrani | Ola Aljarrah | Safa Alajlan | Areej Alshaqarawi | Maryam Alshihri | Sultana Alghurabi | Atikah Alzeghayer | Afrah Altamimi | Abdullah Alfaifi | Abdulrahman AlOsaimy
Proceedings of The Third Arabic Natural Language Processing Conference

The impressive advancement of Large Language Models (LLMs) in English has not been matched across all languages. In particular, LLM performance in Arabic lags behind, due to data scarcity, linguistic diversity of Arabic and its dialects, morphological complexity, etc. Progress is further hindered by the quality of Arabic benchmarks, which typically rely on static, publicly available data, lack comprehensive task coverage, or do not provide dedicated platforms with blind test sets. This makes it challenging to measure actual progress and to mitigate data contamination. Here, we aim to bridge these gaps. In particular, we introduce BALSAM, a comprehensive, community-driven benchmark aimed at advancing Arabic LLM development and evaluation. It includes 78 NLP tasks from 14 broad categories, with 52K examples divided into 37K test and 15K development, and a centralized, transparent platform for blind evaluation. We envision BALSAM as a unifying platform that sets standards and promotes collaborative research to advance Arabic LLM capabilities.

Feature Engineering is not Dead: A Step Towards State of the Art for Arabic Automated Essay Scoring
Marwan Sayed | Sohaila Eltanbouly | May Bashendy | Tamer Elsayed
Proceedings of The Third Arabic Natural Language Processing Conference

Automated Essay Scoring (AES) has shown significant advancements in educational assessment. However, under-resourced languages like Arabic have received limited attention. To bridge this gap and enable robust Arabic AES, this paper introduces the first publicly-available comprehensive set of engineered features tailored for Arabic AES, covering surface-level, readability, lexical, syntactic, and semantic features. Experiments are conducted on a dataset of 620 Arabic essays, each annotated with both holistic and trait-specific scores. Our findings demonstrate that the proposed feature set is effective across different models and competitive with recent NLP advances including LLMs, establishing the state-of-the-art performance and providing strong baselines for future Arabic AES research. Moroever, the resulting feature set offers a reusable and foundational resource, contributing towards the development of more effective Arabic AES systems.

Can LLMs Directly Retrieve Passages for Answering Questions from Qur’an?
Sohaila Eltanbouly | Salam Albatarni | Shaimaa Hassanein | Tamer Elsayed
Proceedings of The Third Arabic Natural Language Processing Conference

The Holy Qur’an provides timeless guidance, addressing modern challenges and offering answers to many important questions. The Qur’an QA 2023 shared task introduced the Qur’anic Passage Retrieval (QPR) task, which involves retrieving relevant passages in response to MSA questions. In this work, we evaluate the ability of seven pre-trained large language models (LLMs) to retrieve relevant passages from the Qur’an in response to given questions, considering zero-shot and several few-shot scenarios. Our experiments show that the best model, Claude, significantly outperforms the state-of-the-art QPR model by 28 points on MAP and 38 points on MRR, exhibiting an impressive improvement of about 113% and 82%, respectively.

2024

Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024
Hend Al-Khalifa | Kareem Darwish | Hamdy Mubarak | Mona Ali | Tamer Elsayed
Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024

Can Large Language Models Automatically Score Proficiency of Written Essays?
Watheq Ahmad Mansour | Salam Albatarni | Sohaila Eltanbouly | Tamer Elsayed
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Although several methods were proposed to address the problem of automated essay scoring (AES) in the last 50 years, there is still much to desire in terms of effectiveness. Large Language Models (LLMs) are transformer-based models that demonstrate extraordinary capabilities on various tasks. In this paper, we test the ability of LLMs, given their powerful linguistic knowledge, to analyze and effectively score written essays. We experimented with two popular LLMs, namely ChatGPT and Llama. We aim to check if these models can do this task and, if so, how their performance is positioned among the state-of-the-art (SOTA) models across two levels, holistically and per individual writing trait. We utilized prompt-engineering tactics in designing four different prompts to bring their maximum potential on this task. Our experiments conducted on the ASAP dataset revealed several interesting observations. First, choosing the right prompt depends highly on the model and nature of the task. Second, the two LLMs exhibited comparable average performance in AES, with a slight advantage for ChatGPT. Finally, despite the performance gap between the two LLMs and SOTA models in terms of predictions, they provide feedback to enhance the quality of the essays, which can potentially help both teachers and students.

ArabicNLU 2024: The First Arabic Natural Language Understanding Shared Task
Mohammed Khalilia | Sanad Malaysha | Reem Suwaileh | Mustafa Jarrar | Alaa Aljabari | Tamer Elsayed | Imed Zitouni
Proceedings of the Second Arabic Natural Language Processing Conference

This paper presents an overview of the Arabic Natural Language Understanding (ArabicNLU 2024) shared task, focusing on two subtasks: Word Sense Disambiguation (WSD) and Location Mention Disambiguation (LMD). The task aimed to evaluate the ability of automated systems to resolve word ambiguity and identify locations mentioned in Arabic text. We provided participants with novel datasets, including a sense-annotated corpus for WSD, called SALMA with approximately 34k annotated tokens, and the dataset with 3,893 annotations and 763 unique location mentions. These are challenging tasks. Out of the 38 registered teams, only three teams participated in the final evaluation phase, with the highest accuracy being 77.8% for WSD and 95.0% for LMD. The shared task not only facilitated the evaluation and comparison of different techniques, but also provided valuable insights and resources for the continued advancement of Arabic NLU technologies.

AuRED: Enabling Arabic Rumor Verification using Evidence from Authorities over Twitter
Fatima Haouari | Tamer Elsayed | Reem Suwaileh
Proceedings of the Second Arabic Natural Language Processing Conference

Diverging from the trend of the previous rumor verification studies, we introduce the new task of rumor verification using evidence that are exclusively captured from authorities, i.e., entities holding the right and knowledge to verify corresponding information. To enable research on this task for Arabic low-resourced language, we construct and release the first Authority-Rumor-Evidence Dataset (AuRED). The dataset comprises 160 rumors expressed in tweets and 692 Twitter timelines of authorities containing about 34k tweets. Additionally, we explore how existing evidence retrieval and claim verification models for fact-checking perform on our task under both the cross-lingual zero-shot and in-domain fine-tuning setups. Our experiments show that although evidence retrieval models perform relatively well on the task establishing strong baselines, there is still a big room for improvement. However, existing claim verification models perform poorly on the task no matter how good the retrieval performance is. The results also show that stance detection can be useful for evidence retrieval. Moreover, existing fact-checking datasets showed a potential in transfer learning to our task, however, further investigation using different datasets and setups is required.

QAES: First Publicly-Available Trait-Specific Annotations for Automated Scoring of Arabic Essays
May Bashendy | Salam Albatarni | Sohaila Eltanbouly | Eman Zahran | Hamdo Elhuseyin | Tamer Elsayed | Walid Massoud | Houda Bouamor
Proceedings of the Second Arabic Natural Language Processing Conference

Automated Essay Scoring (AES) has emerged as a significant research problem within natural language processing, providing valuable support for educators in assessing student writing skills. In this paper, we introduce QAES, the first publicly available trait-specific annotations for Arabic AES, built on the Qatari Corpus of Argumentative Writing (QCAW). QAES includes a diverse collection of essays in Arabic, each of them annotated with holistic and trait-specific scores, including relevance, organization, vocabulary, style, development, mechanics, and grammar. In total, it comprises 195 Arabic essays (with lengths ranging from 239 to 806 words) across two distinct argumentative writing tasks. We benchmark our dataset against the state-of-the-art English baselines and a feature-based approach. In addition, we discuss the adopted guidelines and the challenges encountered during the annotation process. Finally, we provide insights into potential areas for improvement and future directions in Arabic AES research.

2023

Qur’an QA 2023 Shared Task: Overview of Passage Retrieval and Reading Comprehension Tasks over the Holy Qur’an
Rana Malhas | Watheq Mansour | Tamer Elsayed
Proceedings of ArabicNLP 2023

Motivated by the need for intelligent question answering (QA) systems on the Holy Qur’an and the success of the first Qur’an Question Answering shared task (Qur’an QA 2022 at OSACT 2022), we have organized the second version at ArabicNLP 2023. The Qur’an QA 2023 is composed of two sub-tasks: the passage retrieval (PR) task and the machine reading comprehension (MRC) task. The main aim of the shared task is to encourage state-of-the-art research on Arabic PR and MRC on the Holy Qur’an. Our shared task has attracted 9 teams to submit 22 runs for the PR task, and 6 teams to submit 17 runs for the MRC task. In this paper, we present an overview of the task and provide an outline of the approaches employed by the participating teams in both sub-tasks.

IDRISI-D: Arabic and English Datasets and Benchmarks for Location Mention Disambiguation over Disaster Microblogs
Reem Suwaileh | Tamer Elsayed | Muhammad Imran
Proceedings of ArabicNLP 2023

Extracting and disambiguating geolocation information from social media data enables effective disaster management, as it helps response authorities; for example, locating incidents for planning rescue activities and affected people for evacuation. Nevertheless, the dearth of resources and tools hinders the development and evaluation of Location Mention Disambiguation (LMD) models in the disaster management domain. Consequently, the LMD task is greatly understudied, especially for the low resource languages such as Arabic. To fill this gap, we introduce IDRISI-D, the largest to date English and the first Arabic public LMD datasets. Additionally, we introduce a modified hierarchical evaluation framework that offers a lenient and nuanced evaluation of LMD systems. We further benchmark IDRISI-D datasets using representative baselines and show the competitiveness of BERT-based models.

IDRISI-RA: The First Arabic Location Mention Recognition Dataset of Disaster Tweets
Reem Suwaileh | Muhammad Imran | Tamer Elsayed
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Extracting geolocation information from social media data enables effective disaster management, as it helps response authorities; for example, in locating incidents for planning rescue activities, and affected people for evacuation. Nevertheless, geolocation extraction is greatly understudied for the low resource languages such as Arabic. To fill this gap, we introduce IDRISI-RA, the first publicly-available Arabic Location Mention Recognition (LMR) dataset that provides human- and automatically-labeled versions in order of thousands and millions of tweets, respectively. It contains both location mentions and their types (e.g., district, city). Our extensive analysis shows the decent geographical, domain, location granularity, temporal, and dialectical coverage of IDRISI-RA. Furthermore, we establish baselines using the standard Arabic NER models and build two simple, yet effective, LMR models. Our rigorous experiments confirm the need for developing specific models for Arabic LMR in the disaster domain. Moreover, experiments show the promising domain and geographical generalizability of IDRISI-RA under zero-shot learning.

2022

Qur’an QA 2022: Overview of The First Shared Task on Question Answering over the Holy Qur’an
Rana Malhas | Watheq Mansour | Tamer Elsayed
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

Motivated by the resurgence of the machine reading comprehension (MRC) research, we have organized the first Qur’an Question Answering shared task, “Qur’an QA 2022”. The task in its first year aims to promote state-of-the-art research on Arabic QA in general and MRC in particular on the Holy Qur’an, which constitutes a rich and fertile source of knowledge for Muslim and non-Muslim inquisitors and knowledge-seekers. In this paper, we provide an overview of the shared task that succeeded in attracting 13 teams to participate in the final phase, with a total of 30 submitted runs. Moreover, we outline the main approaches adopted by the participating teams in the context of highlighting some of our perceptions and general trends that characterize the participating systems and their submitted runs.

Detecting Users Prone to Spread Fake News on Arabic Twitter
Zien Sheikh Ali | Abdulaziz Al-Ali | Tamer Elsayed
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

The spread of misinformation has become a major concern to our society, and social media is one of its main culprits. Evidently, health misinformation related to vaccinations has slowed down global efforts to fight the COVID-19 pandemic. Studies have shown that fake news spreads substantially faster than real news on social media networks. One way to limit this fast dissemination is by assessing information sources in a semi-automatic way. To this end, we aim to identify users who are prone to spread fake news in Arabic Twitter. Such users play an important role in spreading misinformation and identifying them has the potential to control the spread. We construct an Arabic dataset on Twitter users, which consists of 1,546 users, of which 541 are prone to spread fake news (based on our definition). We use features extracted from users’ recent tweets, e.g., linguistic, statistical, and profile features, to predict whether they are prone to spread fake news or not. To tackle the classification task, multiple learning models are employed and evaluated. Empirical results reveal promising detection performance, where an F1 score of 0.73 was achieved by the logistic regression model. Moreover, when tested on a benchmark English dataset, our approach has outperformed the current state-of-the-art for this task.

Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection
Hend Al-Khalifa | Tamer Elsayed | Hamdy Mubarak | Abdulmohsen Al-Thubaity | Walid Magdy | Kareem Darwish
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

2021

ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks
Fatima Haouari | Maram Hasanain | Reem Suwaileh | Tamer Elsayed
Proceedings of the Sixth Arabic Natural Language Processing Workshop

In this paper, we present ArCOV-19, an Arabic COVID-19 Twitter dataset that spans one year, covering the period from 27th of January 2020 till 31st of January 2021. ArCOV-19 is the first publicly-available Arabic Twitter dataset covering COVID-19 pandemic that includes about 2.7M tweets alongside the propagation networks of the most-popular subset of them (i.e., most-retweeted and -liked). The propagation networks include both retweetsand conversational threads (i.e., threads of replies). ArCOV-19 is designed to enable research under several domains including natural language processing, information retrieval, and social computing. Preliminary analysis shows that ArCOV-19 captures rising discussions associated with the first reported cases of the disease as they appeared in the Arab world. In addition to the source tweets and the propagation networks, we also release the search queries and the language-independent crawler used to collect the tweets to encourage the curation of similar datasets.

ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection
Fatima Haouari | Maram Hasanain | Reem Suwaileh | Tamer Elsayed
Proceedings of the Sixth Arabic Natural Language Processing Workshop

In this paper we introduce ArCOV19-Rumors, an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020. We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims. Tweets were manually-annotated by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic. ArCOV19-Rumors supports two levels of misinformation detection over Twitter: verifying free-text claims (called claim-level verification) and verifying claims expressed in tweets (called tweet-level verification). Our dataset covers, in addition to health, claims related to other topical categories that were influenced by COVID-19, namely, social, politics, sports, entertainment, and religious. Moreover, we present benchmarking results for tweet-level verification on the dataset. We experimented with SOTA models of versatile approaches that either exploit content, user profiles features, temporal features and propagation structure of the conversational threads for tweet verification.

AraFacts: The First Large Arabic Dataset of Naturally Occurring Claims
Zien Sheikh Ali | Watheq Mansour | Tamer Elsayed | Abdulaziz Al-Ali
Proceedings of the Sixth Arabic Natural Language Processing Workshop

We introduce AraFacts, the first large Arabic dataset of naturally occurring claims collected from 5 Arabic fact-checking websites, e.g., Fatabyyano and Misbar, and covering claims since 2016. Our dataset consists of 6,121 claims along with their factual labels and additional metadata, such as fact-checking article content, topical category, and links to posts or Web pages spreading the claim. Since the data is obtained from various fact-checking websites, we standardize the original claim labels to provide a unified label rating for all claims. Moreover, we provide revealing dataset statistics and motivate its use by suggesting possible research applications. The dataset is made publicly available for the research community.

2020

Overview of OSACT4 Arabic Offensive Language Detection Shared Task
Hamdy Mubarak | Kareem Darwish | Walid Magdy | Tamer Elsayed | Hend Al-Khalifa
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

This paper provides an overview of the offensive language detection shared task at the 4th workshop on Open-Source Arabic Corpora and Processing Tools (OSACT4). There were two subtasks, namely: Subtask A, involving the detection of offensive language, which contains unacceptable or vulgar content in addition to any kind of explicit or implicit insults or attacks against individuals or groups; and Subtask B, involving the detection of hate speech, which contains insults or threats targeting a group based on their nationality, ethnicity, race, gender, political or sport affiliation, religious belief, or other common characteristics. In total, 40 teams signed up to participate in Subtask A, and 14 of them submitted test runs. For Subtask B, 33 teams signed up to participate and 13 of them submitted runs. We present and analyze all submissions in this paper.

Quick and Simple Approach for Detecting Hate Speech in Arabic Tweets
Abeer Abuzayed | Tamer Elsayed
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

As the use of social media platforms increases extensively to freely communicate and share opinions, hate speech becomes an outstanding problem that requires urgent attention. This paper focuses on the problem of detecting hate speech in Arabic tweets. To tackle the problem efficiently, we adopt a “quick and simple” approach by which we investigate the effectiveness of 15 classical (e.g., SVM) and neural (e.g., CNN) learning models, while exploring two different term representations. Our experiments on 8k labelled dataset show that the best neural learning models outperform the classical ones, while distributed term representation is more effective than statistical bag-of-words representation. Overall, our best classifier (that combines both CNN and RNN in a joint architecture) achieved 0.73 macro-F1 score on the dev set, which significantly outperforms the majority-class baseline that achieves 0.49, proving the effectiveness of our “quick and simple” approach.

Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection
Hend Al-Khalifa | Walid Magdy | Kareem Darwish | Tamer Elsayed | Hamdy Mubarak
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

Are We Ready for this Disaster? Towards Location Mention Recognition from Crisis Tweets
Reem Suwaileh | Muhammad Imran | Tamer Elsayed | Hassan Sajjad
Proceedings of the 28th International Conference on Computational Linguistics

The widespread usage of Twitter during emergencies has provided a new opportunity and timely resource to crisis responders for various disaster management tasks. Geolocation information of pertinent tweets is crucial for gaining situational awareness and delivering aid. However, the majority of tweets do not come with geoinformation. In this work, we focus on the task of location mention recognition from crisis-related tweets. Specifically, we investigate the influence of different types of labeled training data on the performance of a BERT-based classification model. We explore several training settings such as combing in- and out-domain data from news articles and general-purpose and crisis-related tweets. Furthermore, we investigate the effect of geospatial proximity while training on near or far-away events from the target event. Using five different datasets, our extensive experiments provide answers to several critical research questions that are useful for the research community to foster research in this important direction. For example, results show that, for training a location mention recognition model, Twitter-based data is preferred over general-purpose data; and crisis-related data is preferred over general-purpose Twitter data. Furthermore, training on data from geographically-nearby disaster events to the target event boosts the performance compared to training on distant events.

2019

Simple But Not Naïve: Fine-Grained Arabic Dialect Identification Using Only N-Grams
Sohaila Eltanbouly | May Bashendy | Tamer Elsayed
Proceedings of the Fourth Arabic Natural Language Processing Workshop

This paper presents the participation of Qatar University team in MADAR shared task, which addresses the problem of sentence-level fine-grained Arabic Dialect Identification over 25 different Arabic dialects in addition to the Modern Standard Arabic. Arabic Dialect Identification is not a trivial task since different dialects share some features, e.g., utilizing the same character set and some vocabularies. We opted to adopt a very simple approach in terms of extracted features and classification models; we only utilize word and character n-grams as features, and Na ̈ıve Bayes models as classifiers. Surprisingly, the simple approach achieved non-na ̈ıve performance. The official results, reported on a held-out testing set, show that the dialect of a given sentence can be identified at an accuracy of 64.58% by our best submitted run.

2018

DART: A Large Dataset of Dialectal Arabic Tweets
Israa Alsarsour | Esraa Mohamed | Reem Suwaileh | Tamer Elsayed
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

QU-BIGIR at SemEval 2017 Task 3: Using Similarity Features for Arabic Community Question Answering Forums
Marwan Torki | Maram Hasanain | Tamer Elsayed
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper we describe our QU-BIGIR system for the Arabic subtask D of the SemEval 2017 Task 3. Our approach builds on our participation in the past version of the same subtask. This year, our system uses different similarity measures that encodes lexical and semantic pairwise similarity of text pairs. In addition to well known similarity measures such as cosine similarity, we use other measures based on the summary statistics of word embedding representation for a given text. To rank a list of candidate question answer pairs for a given question, we learn a linear SVM classifier over our similarity features. Our best resulting run came second in subtask D with a very competitive performance to the first-ranking system.

2016

QU-IR at SemEval 2016 Task 3: Learning to Rank on Arabic Community Question Answering Forums with Word Embedding
Rana Malhas | Marwan Torki | Tamer Elsayed
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2009

Arabic Cross-Document Coreference Resolution
Asad Sayeed | Tamer Elsayed | Nikesh Garera | David Alexander | Tan Xu | Doug Oard | David Yarowsky | Christine Piatko
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

Pairwise Document Similarity in Large Collections with MapReduce
Tamer Elsayed | Jimmy Lin | Douglas Oard
Proceedings of ACL-08: HLT, Short Papers

Resolving Personal Names in Email Using Context Expansion
Tamer Elsayed | Douglas W. Oard | Galileo Namata
Proceedings of ACL-08: HLT

Co-authors

Kareem Darwish 5

Hend Al-Khalifa 4

Fatima Haouari 4

Maram Hasanain 4

Watheq Mansour 4

Houda Bouamor 3

Walid Massoud 3

Douglas W. Oard 3

Abdulaziz Al-Ali 2

Abdulmohsen Al-Thubaity 2

Muhammad Imran 2

Zien Sheikh Ali 2

Ahmed Abdelali 1

Abeer Abuzayed 1

Asma Al Wazrah 1

Rawan Al-Matham 1

Raghad Al-Rasheed 1

Abdulrahman AlOsaimy 1

Eman Albilali 1

David Alexander 1

Abdullah Alfaifi 1

Emad Alghamdi 1

Sultana Alghurabi 1

Bashar Alhafni 1

Mais Alheraki 1

Muneera Alhoshan 1

Alaa Aljabari 1

Amal Almazrua 1

Khalid Almubarak 1

Israa Alsarsour 1

Saied Alshahrani 1

Waad Thuwaini Alshammari 1

Areej Alshaqarawi 1

Maryam Alshihri 1

Afrah Altamimi 1

Nora Altwairesh 1

Zaid Alyafeai 1

Norah A. Alzahrani 1

Atikah Alzeghayer 1

Mohamed Anwar 1

Fatima Zahra Brahamia 1

Kareem Mohamed Darwish 1

Abdelrahman El-Sheikh 1

Hoor Tamer Elbahnasawi 1

Hamdo Elhuseyin 1

Khalid Elmadani 1

Muhammad Elmallah 1

Mahmoud Fawzi 1

Nikesh Garera 1

Shaimaa Hassanein 1

Muhammad Imran 1

Mustafa Jarrar 1

Mohammed Khalilia 1

Sanad Malaysha 1

Watheq Ahmad Mansour 1

Abubakr Mohamed 1

Esraa Mohamed 1

Preslav Nakov 1

Galileo Namata 1

Christine Piatko 1

Hassan Sajjad 1

Shady Shehata 1

David Yarowsky 1

Venues