Automatic Hadith Segmentation using PPM Compression

Taghreed Tarmom, Eric Atwell, Mohammad Alsalka


Abstract
In this paper we explore the use of Prediction by partial matching (PPM) compression based to segment Hadith into its two main components (Isnad and Matan). The experiments utilized the PPMD variant of the PPM, showing that PPMD is effective in Hadith segmentation. It was also tested on Hadith corpora of different structures. In the first experiment we used the non- authentic Hadith (NAH) corpus for train- ing models and testing, and in the second experiment we used the NAH corpus for training models and the Leeds University and King Saud University (LK) Hadith cor- pus for testing PPMD segmenter. PPMD of order 7 achieved an accuracy of 92.76% and 90.10% in the first and second experiments, respectively.
Anthology ID:
2020.icon-main.4
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2020
Address:
Indian Institute of Technology Patna, Patna, India
Editors:
Pushpak Bhattacharyya, Dipti Misra Sharma, Rajeev Sangal
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
22–29
Language:
URL:
https://aclanthology.org/2020.icon-main.4
DOI:
Bibkey:
Cite (ACL):
Taghreed Tarmom, Eric Atwell, and Mohammad Alsalka. 2020. Automatic Hadith Segmentation using PPM Compression. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 22–29, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
Automatic Hadith Segmentation using PPM Compression (Tarmom et al., ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-main.4.pdf