Zhicheng Lin


2024

pdf bib
Hierarchical Topic Modeling via Contrastive Learning and Hyperbolic Embedding
Zhicheng Lin | HeGang Chen | Yuyin Lu | Yanghui Rao | Hao Xu | Hanjiang Lai
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Hierarchical topic modeling, which can mine implicit semantics in the corpus and automatically construct topic hierarchical relationships, has received considerable attention recently. However, the current hierarchical topic models are mainly based on Euclidean space, which cannot well retain the implicit hierarchical semantic information in the corpus, leading to irrational structure of the generated topics. On the other hand, the existing Generative Adversarial Network (GAN) based neural topic models perform satisfactorily, but they remain constrained by pattern collapse due to the discontinuity of latent space. To solve the above problems, with the hypothesis of hyperbolic space, we propose a novel GAN-based hierarchical topic model to mine high-quality topics by introducing contrastive learning to capture information from documents. Furthermore, the distinct tree-like property of hyperbolic space preserves the implicit hierarchical semantics of documents in topic embeddings, which are projected into the hyperbolic space. Finally, we use a multi-head self-attention mechanism to learn implicit hierarchical semantics of topics and mine topic structure information. Experiments on real-world corpora demonstrate the remarkable performance of our model on topic coherence and topic diversity, as well as the rationality of the topic hierarchy.