Mohammad Ammar Alsalka


2023

pdf bib
HAQA and QUQA: Constructing Two Arabic Question-Answering Corpora for the Quran and Hadith
Sarah Alnefaie | Eric Atwell | Mohammad Ammar Alsalka
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

It is neither possible nor fair to compare the performance of question-answering systems for the Holy Quran and Hadith Sharif in Arabic due to both the absence of a golden test dataset on the Hadith Sharif and the small size and easy questions of the newly created golden test dataset on the Holy Quran. This article presents two question–answer datasets: Hadith Question–Answer pairs (HAQA) and Quran Question–Answer pairs (QUQA). HAQA is the first Arabic Hadith question–answer dataset available to the research community, while the QUQA dataset is regarded as the more challenging and the most extensive collection of Arabic question–answer pairs on the Quran. HAQA was designed and its data collected from several expert sources, while QUQA went through several steps in the construction phase; that is, it was designed and then integrated with existing datasets in different formats, after which the datasets were enlarged with the addition of new data from books by experts. The HAQA corpus consists of 1598 question–answer pairs, and that of QUQA contains 3382. They may be useful as gold–standard datasets for the evaluation process, as training datasets for language models with question-answering tasks and for other uses in artificial intelligence.

pdf bib
Is GPT-4 a Good Islamic Expert for Answering Quran Questions?
Sarah Alnefaie | Eric Atwell | Mohammad Ammar Alsalka
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)