Evaluating Prompt Relevance in Arabic Automatic Essay Scoring: Insights from Synthetic and Real-World Data

Chatrine Qwaider; Kirill Chirkunov; Bashar Alhafni; Nizar Habash; Ted Briscoe

doi:10.18653/v1/2025.arabicnlp-main.13

Evaluating Prompt Relevance in Arabic Automatic Essay Scoring: Insights from Synthetic and Real-World Data

Chatrine Qwaider, Kirill Chirkunov, Bashar Alhafni, Nizar Habash, Ted Briscoe

Abstract

Prompt relevance is a critical yet underexplored dimension in Arabic Automated Essay Scoring (AES). We present the first systematic study of binary prompt-essay relevance classification, supporting both AES scoring and dataset annotation. To address data scarcity, we built a synthetic dataset of on-topic and off-topic pairs and evaluated multiple models, including threshold-based classifiers, SVMs, causal LLMs, and a fine-tuned masked SBERT model. For real-data evaluation, we combined QAES with ZAEBUC, creating off-topic pairs via mismatched prompts. We also tested prompt expansion strategies using AraVec, CAMeL, and GPT-4o. Our fine-tuned SBERT achieved 98% F1 on synthetic data and strong results on QAES+ZAEBUC, outperforming SVMs and threshold-based baselines and offering a resource-efficient alternative to LLMs. This work establishes the first benchmark for Arabic prompt relevance and provides practical strategies for low-resource AES.

Anthology ID:: 2025.arabicnlp-main.13
Volume:: Proceedings of The Third Arabic Natural Language Processing Conference
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari
Venue:: ArabicNLP
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 162–178
Language:
URL:: https://aclanthology.org/2025.arabicnlp-main.13/
DOI:: 10.18653/v1/2025.arabicnlp-main.13
Bibkey:
Cite (ACL):: Chatrine Qwaider, Kirill Chirkunov, Bashar Alhafni, Nizar Habash, and Ted Briscoe. 2025. Evaluating Prompt Relevance in Arabic Automatic Essay Scoring: Insights from Synthetic and Real-World Data. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 162–178, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Evaluating Prompt Relevance in Arabic Automatic Essay Scoring: Insights from Synthetic and Real-World Data (Qwaider et al., ArabicNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.arabicnlp-main.13.pdf

PDF Cite Search Fix data