Classifying Verses of the Quran using Doc2vec

Menwa Alshammeri, Eric Atwell, Mohammad Alsalka


Abstract
The Quran, as a significant religious text, bears important spiritual and linguistic values. Understanding the text and inferring the underlying meanings entails semantic similarity analysis. We classified the verses of the Quran into 15 pre-defined categories or concepts, based on the Qurany corpus, using Doc2Vec and Logistic Regression. Our classifier scored 70% accuracy, and 60% F1-score using the distributed bag-of-words architecture. We then measured how similar the documents within the same category are to each other semantically and use this information to evaluate our model. We calculated the mean difference and average similarity values for each category to indicate how well our model describes that category.
Anthology ID:
2021.icon-main.34
Volume:
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2021
Address:
National Institute of Technology Silchar, Silchar, India
Editors:
Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
284–288
Language:
URL:
https://aclanthology.org/2021.icon-main.34
DOI:
Bibkey:
Cite (ACL):
Menwa Alshammeri, Eric Atwell, and Mohammad Alsalka. 2021. Classifying Verses of the Quran using Doc2vec. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 284–288, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):
Classifying Verses of the Quran using Doc2vec (Alshammeri et al., ICON 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.icon-main.34.pdf