Sara Faqihi
2026
QAMAR: A New Fully Verified and Accurate Quranic Arabic Morphological Analysis Resource.
Sara Faqihi | Karim Bouzoubaa | Rachida Tajmout | Driss Namly
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Sara Faqihi | Karim Bouzoubaa | Rachida Tajmout | Driss Namly
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Several Quranic morphological corpora have been developed to support Arabic linguistic analysis and NLP applications, yet they often lack full coverage, consistency, or manual verification. We present QAMAR, a morphologically oriented, multi-task corpus derived from the Qur’an. This comprehensive, manually verified resource provides a detailed linguistic layer for every Quranic word, including the Modern Standard Arabic (MSA) equivalent, the stem, the lemma, the root, and the part of speech (POS). QAMAR supports multiple NLP tasks, such as normalization, lemmatization, root extraction, and POS tagging, and serves as a gold-standard reference for Quranic and Arabic NLP research, including corpus-to-corpus evaluation and morphological analyzer benchmarking. The paper details QAMAR’s annotation framework, verification process, and resource structure, and reports comparative analyses with existing Quranic morphological resources and outputs produced by current large language models (LLMs).