RoBERT2VecTM: A Novel Approach for Topic Extraction in Islamic Studies

Sania Aftar; Luca Gagliardelli; Amina El Ganadi; Federico Ruozzi; Sonia Bergamaschi

doi:10.18653/v1/2024.findings-emnlp.534

RoBERT2VecTM: A Novel Approach for Topic Extraction in Islamic Studies

Sania Aftar, Luca Gagliardelli, Amina El Ganadi, Federico Ruozzi, Sonia Bergamaschi

Abstract

Investigating “Hadith” texts, crucial for theological studies and Islamic jurisprudence, presents challenges due to the linguistic complexity of Arabic, such as its complex morphology. In this paper, we propose an innovative approach to address the challenges of topic modeling in Hadith studies by utilizing the Contextualized Topic Model (CTM). Our study introduces RoBERT2VecTM, a novel neural-based approach that combines the RoBERTa transformer model with Doc2Vec, specifically targeting the semantic analysis of “Matn” (the actual content). The methodology outperforms many traditional state-of-the-art NLP models by generating more coherent and diverse Arabic topics. The diversity of the generated topics allows for further categorization, deepening the understanding of discussed concepts. Notably, our research highlights the critical impact of lemmatization and stopwords in enhancing topic modeling. This breakthrough marks a significant stride in applying NLP to non-Latin languages and opens new avenues for the nuanced analysis of complex religious texts.

Anthology ID:: 2024.findings-emnlp.534
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9148–9158
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.534/
DOI:: 10.18653/v1/2024.findings-emnlp.534
Bibkey:
Cite (ACL):: Sania Aftar, Luca Gagliardelli, Amina El Ganadi, Federico Ruozzi, and Sonia Bergamaschi. 2024. RoBERT2VecTM: A Novel Approach for Topic Extraction in Islamic Studies. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9148–9158, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: RoBERT2VecTM: A Novel Approach for Topic Extraction in Islamic Studies (Aftar et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.534.pdf

PDF Cite Search Fix data