QAMAR: A New Fully Verified and Accurate Quranic Arabic Morphological Analysis Resource.

Sara Faqihi, Karim Bouzoubaa, Rachida Tajmout, Driss Namly


Abstract
Several Quranic morphological corpora have been developed to support Arabic linguistic analysis and NLP applications, yet they often lack full coverage, consistency, or manual verification. We present QAMAR, a morphologically oriented, multi-task corpus derived from the Qur’an. This comprehensive, manually verified resource provides a detailed linguistic layer for every Quranic word, including the Modern Standard Arabic (MSA) equivalent, the stem, the lemma, the root, and the part of speech (POS). QAMAR supports multiple NLP tasks, such as normalization, lemmatization, root extraction, and POS tagging, and serves as a gold-standard reference for Quranic and Arabic NLP research, including corpus-to-corpus evaluation and morphological analyzer benchmarking. The paper details QAMAR’s annotation framework, verification process, and resource structure, and reports comparative analyses with existing Quranic morphological resources and outputs produced by current large language models (LLMs).
Anthology ID:
2026.abjadnlp-1.38
Volume:
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
AbjadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
301–312
Language:
URL:
https://aclanthology.org/2026.abjadnlp-1.38/
DOI:
Bibkey:
Cite (ACL):
Sara Faqihi, Karim Bouzoubaa, Rachida Tajmout, and Driss Namly. 2026. QAMAR: A New Fully Verified and Accurate Quranic Arabic Morphological Analysis Resource.. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 301–312, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
QAMAR: A New Fully Verified and Accurate Quranic Arabic Morphological Analysis Resource. (Faqihi et al., AbjadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.abjadnlp-1.38.pdf
Optionalsupplementarymaterial:
 2026.abjadnlp-1.38.OptionalSupplementaryMaterial.zip