MASAQ Parser: A Fine-grained MorphoSyntactic Analyzer for the Quran

Majdi Sawalha, Faisal Alshargi, Sane Yagi, Abdallah T. AlShdaifat, Bassam Hammo


Abstract
This paper introduces the Morphological and Syntactical analysis for the Quran text. In this research we have constructed the MASAQ dataset, a comprehensive resource designed to address the scarcity of annotated Quranic Arabic corpora and facilitate the development of advanced Natural Language Processing (NLP) models. The Quran, being a cornerstone of classical Arabic, presents unique challenges for NLP due to its sacred nature and complex linguistic features. MASAQ provides a detailed syntactic and morphological annotation of the entire Quranic text that includes more than 131K morphological entries and 123K instances of syntactic functions, covering a wide range of grammatical roles and relationships. MASAQ’s unique features include a comprehensive tagset of 72 syntactic roles, detailed morphological analysis, and context-specific annotations. This dataset is particularly valuable for tasks such as dependency parsing, grammar checking, machine translation, and text summarization. The potential applications of MASAQ are vast, ranging from pedagogical uses in teaching Arabic grammar to developing sophisticated NLP tools. By providing a high-quality, syntactically annotated dataset, MASAQ aims to advance the field of Arabic NLP, enabling more accurate and more efficient language processing tools. The dataset is made available under the Creative Commons Attribution 3.0 License, ensuring compliance with ethical guidelines and respecting the integrity of the Quranic text.
Anthology ID:
2025.clrel-1.7
Volume:
Proceedings of the New Horizons in Computational Linguistics for Religious Texts
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Sane Yagi, Sane Yagi, Majdi Sawalha, Bayan Abu Shawar, Abdallah T. AlShdaifat, Norhan Abbas, Organizers
Venues:
CLRel | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
67–75
Language:
URL:
https://aclanthology.org/2025.clrel-1.7/
DOI:
Bibkey:
Cite (ACL):
Majdi Sawalha, Faisal Alshargi, Sane Yagi, Abdallah T. AlShdaifat, and Bassam Hammo. 2025. MASAQ Parser: A Fine-grained MorphoSyntactic Analyzer for the Quran. In Proceedings of the New Horizons in Computational Linguistics for Religious Texts, pages 67–75, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
MASAQ Parser: A Fine-grained MorphoSyntactic Analyzer for the Quran (Sawalha et al., CLRel 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.clrel-1.7.pdf