Mahfuzur Rahman Chowdhury


2023

pdf bib
Topic Modeling Using Community Detection on a Word Association Graph
Mahfuzur Rahman Chowdhury | Intesur Ahmed | Farig Sadeque | Muhammad Yanhaona
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Topic modeling of a text corpus is one of the most well-studied areas of information retrieval and knowledge discovery. Despite several decades of research in the area that begets an array of modeling tools, some common problems still obstruct automated topic modeling from matching users’ expectations. In particular, existing topic modeling solutions suffer when the distribution of words among the underlying topics is uneven or the topics are overlapped. Furthermore, many solutions ask the user to provide a topic count estimate as input, which limits their usefulness in modeling a corpus where such information is unavailable. We propose a new topic modeling approach that overcomes these shortcomings by formulating the topic modeling problem as a community detection problem in a word association graph/network that we generate from the text corpus. Experimental evaluation using multiple data sets of three different types of text corpora shows that our approach is superior to prominent topic modeling alternatives in most cases. This paper describes our approach and discusses the experimental findings.