Salam Albatarni

2025

pdf bib abs
Can LLMs Directly Retrieve Passages for Answering Questions from Qur’an?
Sohaila Eltanbouly | Salam Albatarni | Shaimaa Hassanein | Tamer Elsayed
Proceedings of The Third Arabic Natural Language Processing Conference

The Holy Qur’an provides timeless guidance, addressing modern challenges and offering answers to many important questions. The Qur’an QA 2023 shared task introduced the Qur’anic Passage Retrieval (QPR) task, which involves retrieving relevant passages in response to MSA questions. In this work, we evaluate the ability of seven pre-trained large language models (LLMs) to retrieve relevant passages from the Qur’an in response to given questions, considering zero-shot and several few-shot scenarios. Our experiments show that the best model, Claude, significantly outperforms the state-of-the-art QPR model by 28 points on MAP and 38 points on MRR, exhibiting an impressive improvement of about 113% and 82%, respectively.

pdf bib abs
TAQEEM 2025: Overview of The First Shared Task for Arabic Quality Evaluation of Essays in Multi-dimensions
May Bashendy | Salam Albatarni | Sohaila Eltanbouly | Walid Massoud | Houda Bouamor | Tamer Elsayed
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

Automated Essay Scoring (AES) has emerged as a significant research problem in natural language processing, offering valuable tools to support educators in assessing student writing. Motivated by the growing need for reliable Arabic AES systems, we organized the first shared Task for Arabic Quality Evaluation of Essays in Multi-dimensions (TAQEEM) held at the ArabicNLP 2025 conference. TAQEEM 2025 includes two subtasks: Task A on holistic scoring and Task B on trait-specific scoring. It introduces a new (and first of its kind) dataset of 1,265 Arabic essays, annotated with holistic and trait-specific scores, including relevance, organization, vocabulary, style, development, mechanics, and grammar. The main goal of TAQEEM is to address the scarcity of standardized benchmarks and high-quality resources in Arabic AES. TAQEEM 2025 attracted 11 registered teams for Task A and 10 for Task B, with a total of 5 teams, across both tasks, submitting system runs for evaluation. This paper presents an overview of the task, outlines the approaches employed, and discusses the results of the participating teams.

pdf bib abs
TRATES: Trait-Specific Rubric-Assisted Cross-Prompt Essay Scoring
Sohaila Eltanbouly | Salam Albatarni | Tamer Elsayed
Findings of the Association for Computational Linguistics: ACL 2025

Research on holistic Automated Essay Scoring (AES) is long-dated; yet, there is a notable lack of attention for assessing essays according to individual traits. In this work, we propose TRATES, a novel trait-specific and rubric-based cross-prompt AES framework that is generic yet specific to the underlying trait. The framework leverages a Large Language Model (LLM) that utilizes the trait grading rubrics to generate trait-specific features (represented by assessment questions), then assesses those features given an essay. The trait-specific features are eventually combined with generic writing-quality and prompt-specific features to train a simple classical regression model that predicts trait scores of essays from an unseen prompt. Experiments show that TRATES achieves a new state-of-the-art performance across all traits on a widely-used dataset, with the generated LLM-based features being the most significant.

2024

Automated Essay Scoring (AES) has emerged as a significant research problem within natural language processing, providing valuable support for educators in assessing student writing skills. In this paper, we introduce QAES, the first publicly available trait-specific annotations for Arabic AES, built on the Qatari Corpus of Argumentative Writing (QCAW). QAES includes a diverse collection of essays in Arabic, each of them annotated with holistic and trait-specific scores, including relevance, organization, vocabulary, style, development, mechanics, and grammar. In total, it comprises 195 Arabic essays (with lengths ranging from 239 to 806 words) across two distinct argumentative writing tasks. We benchmark our dataset against the state-of-the-art English baselines and a feature-based approach. In addition, we discuss the adopted guidelines and the challenges encountered during the annotation process. Finally, we provide insights into potential areas for improvement and future directions in Arabic AES research.

pdf bib abs
Can Large Language Models Automatically Score Proficiency of Written Essays?
Watheq Ahmad Mansour | Salam Albatarni | Sohaila Eltanbouly | Tamer Elsayed
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Although several methods were proposed to address the problem of automated essay scoring (AES) in the last 50 years, there is still much to desire in terms of effectiveness. Large Language Models (LLMs) are transformer-based models that demonstrate extraordinary capabilities on various tasks. In this paper, we test the ability of LLMs, given their powerful linguistic knowledge, to analyze and effectively score written essays. We experimented with two popular LLMs, namely ChatGPT and Llama. We aim to check if these models can do this task and, if so, how their performance is positioned among the state-of-the-art (SOTA) models across two levels, holistically and per individual writing trait. We utilized prompt-engineering tactics in designing four different prompts to bring their maximum potential on this task. Our experiments conducted on the ASAP dataset revealed several interesting observations. First, choosing the right prompt depends highly on the model and nature of the task. Second, the two LLMs exhibited comparable average performance in AES, with a slight advantage for ChatGPT. Finally, despite the performance gap between the two LLMs and SOTA models in terms of predictions, they provide feedback to enhance the quality of the essays, which can potentially help both teachers and students.

Co-authors

Hamdo Elhuseyin 1

Shaimaa Hassanein 1

Watheq Ahmad Mansour 1

Eman Zahran 1

Venues

Fix author