GCD-TM: Graph-Driven Community Detection for Topic Modelling in Psychiatry Texts

Anusuya Krishnan; Isaias Mehari Ghebrehiwet

GCD-TM: Graph-Driven Community Detection for Topic Modelling in Psychiatry Texts

Anusuya Krishnan, Isaias Mehari Ghebrehiwet

Abstract

Psychiatry texts provide critical insights into patient mental states and therapeutic interactions. These texts are essential for understanding psychiatric conditions, treatment dynamics, and patient responses. However, the complex and diverse nature of psychiatric communications poses significant challenges for traditional topic modeling methods. The intricate language, subtle psychological nuances, and varying lengths of text segments make it difficult to extract coherent and meaningful topics. Conventional approaches often struggle to capture the depth and overlap of themes present in these texts. In this study, we present a novel approach to topic modeling that addresses these limitations by reformulating the problem as a community detection task within a graph constructed from the text corpus. Our methodology includes lemmatization for data standardization, TF-IDF vectorization to create a term-document matrix, and cosine similarity computation to produce a similarity matrix. This matrix is then binarized to form a graph, on which community detection is performed using the Louvain method. The detected communities are subsequently analyzed with Latent Dirichlet Allocation (LDA) to extract topics. Our approach outperforms traditional topic modeling methods, offering more accurate and interpretable topic extraction with improved coherence and lower perplexity.

Anthology ID:: 2024.nlp4science-1.6
Volume:: Proceedings of the 1st Workshop on NLP for Science (NLP4Science)
Month:: November
Year:: 2024
Address:: Miami, FL, USA
Editors:: Lotem Peled-Cohen, Nitay Calderon, Shir Lissak, Roi Reichart
Venue:: NLP4Science
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 47–57
Language:
URL:: https://aclanthology.org/2024.nlp4science-1.6
DOI:
Bibkey:
Cite (ACL):: Anusuya Krishnan and Isaias Mehari Ghebrehiwet. 2024. GCD-TM: Graph-Driven Community Detection for Topic Modelling in Psychiatry Texts. In Proceedings of the 1st Workshop on NLP for Science (NLP4Science), pages 47–57, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):: GCD-TM: Graph-Driven Community Detection for Topic Modelling in Psychiatry Texts (Krishnan & Ghebrehiwet, NLP4Science 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.nlp4science-1.6.pdf

PDF Cite Search