DARES: Dataset for Arabic Readability Estimation of School Materials

Mo El-Haj, Sultan Almujaiwel, Damith Premasiri, Tharindu Ranasinghe, Ruslan Mitkov


Abstract
This research introduces DARES, a dataset for assessing the readability of Arabic text in Saudi school materials. DARES compromise of 13335 instances from textbooks used in 2021 and contains two subtasks; (a) Coarse-grained readability assessment where the text is classified into different educational levels such as primary and secondary. (b) Fine-grained readability assessment where the text is classified into individual grades.. We fine-tuned five transformer models that support Arabic and found that CAMeLBERTmix performed the best in all input settings. Evaluation results showed high performance for the coarse-grained readability assessment task, achieving a weighted F1 score of 0.91 and a macro F1 score of 0.79. The fine-grained task achieved a weighted F1 score of 0.68 and a macro F1 score of 0.55. These findings demonstrate the potential of our approach for advancing Arabic text readability assessment in education, with implications for future innovations in the field.
Anthology ID:
2024.determit-1.10
Volume:
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Giorgio Maria Di Nunzio, Federica Vezzani, Liana Ermakova, Hosein Azarbonyad, Jaap Kamps
Venues:
DeTermIt | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
103–113
Language:
URL:
https://aclanthology.org/2024.determit-1.10
DOI:
Bibkey:
Cite (ACL):
Mo El-Haj, Sultan Almujaiwel, Damith Premasiri, Tharindu Ranasinghe, and Ruslan Mitkov. 2024. DARES: Dataset for Arabic Readability Estimation of School Materials. In Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, pages 103–113, Torino, Italia. ELRA and ICCL.
Cite (Informal):
DARES: Dataset for Arabic Readability Estimation of School Materials (El-Haj et al., DeTermIt-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.determit-1.10.pdf