GINopic: Topic Modeling with Graph Isomorphism Network

Suman Adhya, Debarshi Kumar Sanyal


Abstract
Topic modeling is a widely used approach for analyzing and exploring large document collections. Recent research efforts have incorporated pre-trained contextualized language models, such as BERT embeddings, into topic modeling. However, they often neglect the intrinsic informational value conveyed by mutual dependencies between words. In this study, we introduce GINopic, a topic modeling framework based on graph isomorphism networks to capture the correlation between words. By conducting intrinsic (quantitative as well as qualitative) and extrinsic evaluations on diverse benchmark datasets, we demonstrate the effectiveness of GINopic compared to existing topic models and highlight its potential for advancing topic modeling.
Anthology ID:
2024.naacl-long.342
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6171–6183
Language:
URL:
https://aclanthology.org/2024.naacl-long.342
DOI:
10.18653/v1/2024.naacl-long.342
Bibkey:
Cite (ACL):
Suman Adhya and Debarshi Kumar Sanyal. 2024. GINopic: Topic Modeling with Graph Isomorphism Network. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6171–6183, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
GINopic: Topic Modeling with Graph Isomorphism Network (Adhya & Sanyal, NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.342.pdf