Maryam Khalid


2025

pdf bib
Automated Authentication of Quranic Verses Using BERT (Bidirectional Encoder Representations from Transformers) based Language Models
Khubaib Amjad Alam | Maryam Khalid | Syed Ahmed Ali | Haroon Mahmood | Qaisar Shafi | Muhammad Haroon | Zulqarnain Haider
Proceedings of the New Horizons in Computational Linguistics for Religious Texts

The proliferation of Quranic content on digital platforms, including websites and social media, has brought about significant challenges in verifying the authenticity of Quranic verses. The inherent complexity of the Arabic language, with its rich morphology, syntax, and semantics, makes traditional text-processing techniques inadequate for robust authentication. This paper addresses this problem by leveraging state-of-the-art transformer-based Language models tailored for Arabic text processing. Our approach involves fine-tuning three transformer architectures BERT-Base-Arabic, AraBERT, and MarBERT on a curated dataset containing both authentic and non-authentic verses. Non-authentic examples were created using sentence-BERT, which applies cosine similarity to introduce subtle modifications. Comprehensive experiments were conducted to evaluate the performance of the models. Among the three candidate models, MarBERT, which is specifically designed for handling Arabic dialects demonstrated superior performance, achieving an F1-score of 93.80%. BERT-Base-Arabic also showed competitive F1 score of 92.90% reflecting its robust understanding of Arabic text. The findings underscore the potential of transformer-based models in addressing linguistic complexities inherent in Quranic text and pave the way for developing automated, reliable tools for Quranic verse authentication in the digital era.