Unsupervised Hierarchical Topic Modeling via Anchor Word Clustering and Path Guidance

Jiyuan Liu, Hegang Chen, Chunjiang Zhu, Yanghui Rao


Abstract
Hierarchical topic models nowadays tend to capture the relationship between words and topics, often ignoring the role of anchor words that guide text generation. For the first time, we detect and add anchor words to the text generation process in an unsupervised way. Firstly, we adopt a clustering algorithm to adaptively detect anchor words that are highly consistent with every topic, which forms the path of topic anchor word. Secondly, we add the causal path of anchor word word to the popular Variational Auto-Encoder (VAE) framework via implicitly using word co-occurrence graphs. We develop the causal path of topic+anchor word higher-layer topic that aids the expression of topic concepts with anchor words to capture a more semantically tight hierarchical topic structure. Finally, we enhance the model’s representation of the anchor words through a novel contrastive learning. After jointly training the aforementioned constraint objectives, we can produce more coherent and diverse topics with a better hierarchical structure. Extensive experiments on three datasets show that our model outperforms state-of-the-art methods.
Anthology ID:
2024.findings-emnlp.440
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7505–7517
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.440/
DOI:
10.18653/v1/2024.findings-emnlp.440
Bibkey:
Cite (ACL):
Jiyuan Liu, Hegang Chen, Chunjiang Zhu, and Yanghui Rao. 2024. Unsupervised Hierarchical Topic Modeling via Anchor Word Clustering and Path Guidance. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 7505–7517, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Hierarchical Topic Modeling via Anchor Word Clustering and Path Guidance (Liu et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.440.pdf