Shatha Altammami
2022
Challenging the Transformer-based models with a Classical Arabic dataset: Quran and Hadith
Shatha Altammami
|
Eric Atwell
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Transformer-based models showed near-perfect results on several downstream tasks. However, their performance on classical Arabic texts is largely unexplored. To fill this gap, we evaluate monolingual, bilingual, and multilingual state-of-the-art models to detect relatedness between the Quran (Muslim holy book) and the Hadith (Prophet Muhammed teachings), which are complex classical Arabic texts with underlying meanings that require deep human understanding. To do this, we carefully built a dataset of Quran-verse and Hadith-teaching pairs by consulting sources of reputable religious experts. This study presents the methodology of creating the dataset, which we make available on our repository, and discusses the models’ performance that calls for the imminent need to explore avenues for improving the quality of these models to capture the semantics in such complex, low-resource texts.
2020
Constructing a Bilingual Hadith Corpus Using a Segmentation Tool
Shatha Altammami
|
Eric Atwell
|
Ammar Alsalka
Proceedings of the Twelfth Language Resources and Evaluation Conference
This article describes the process of gathering and constructing a bilingual parallel corpus of Islamic Hadith, which is the set of narratives reporting different aspects of the prophet Muhammad’s life. The corpus data is gathered from the six canonical Hadith collections using a custom segmentation tool that automatically segments and annotates the two Hadith components with 92% accuracy. This Hadith segmenter minimises the costs of language resource creation and produces consistent results independently from previous knowledge and experiences that usually influence human annotators. The corpus includes more than 10M tokens and will be freely available via the LREC repository.
2019
Text Segmentation Using N-grams to Annotate Hadith Corpus
Shatha Altammami
|
Eric Atwell
|
Ammar Alsalka
Proceedings of the 3rd Workshop on Arabic Corpus Linguistics
Search