Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection

Chatrine Qwaider; Bashar Alhafni; Kirill Chirkunov; Nizar Habash; Ted Briscoe

doi:10.18653/v1/2025.bea-1.40

Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection

Chatrine Qwaider, Bashar Alhafni, Kirill Chirkunov, Nizar Habash, Ted Briscoe

Abstract

Automated Essay Scoring (AES) plays a crucial role in assessing language learners’ writingquality, reducing grading workload, and providing real-time feedback. The lack of annotatedessay datasets inhibits the development of Arabic AES systems. This paper leverages LargeLanguage Models (LLMs) and Transformermodels to generate synthetic Arabic essays forAES. We prompt an LLM to generate essaysacross the Common European Framework ofReference (CEFR) proficiency levels and introduce and compare two approaches to errorinjection. We create a dataset of 3,040 annotated essays with errors injected using our twomethods. Additionally, we develop a BERTbased Arabic AES system calibrated to CEFRlevels. Our experimental results demonstratethe effectiveness of our synthetic dataset in improving Arabic AES performance. We makeour code and data publicly available

Anthology ID:: 2025.bea-1.40
Volume:: Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:: BEA | WS
SIG:: SIGEDU
Publisher:: Association for Computational Linguistics
Note:
Pages:: 549–563
Language:
URL:: https://aclanthology.org/2025.bea-1.40/
DOI:: 10.18653/v1/2025.bea-1.40
Bibkey:
Cite (ACL):: Chatrine Qwaider, Bashar Alhafni, Kirill Chirkunov, Nizar Habash, and Ted Briscoe. 2025. Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 549–563, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection (Qwaider et al., BEA 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.bea-1.40.pdf

PDF Cite Search Fix data