GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model

Qile Zhu, Zheng Feng, Xiaolin Li


Abstract
Discovering the latent topics within texts has been a fundamental task for many applications. However, conventional topic models suffer different problems in different settings. The Latent Dirichlet Allocation (LDA) may not work well for short texts due to the data sparsity (i.e. the sparse word co-occurrence patterns in short documents). The Biterm Topic Model (BTM) learns topics by modeling the word-pairs named biterms in the whole corpus. This assumption is very strong when documents are long with rich topic information and do not exhibit the transitivity of biterms. In this paper, we propose a novel way called GraphBTM to represent biterms as graphs and design a Graph Convolutional Networks (GCNs) with residual connections to extract transitive features from biterms. To overcome the data sparsity of LDA and the strong assumption of BTM, we sample a fixed number of documents to form a mini-corpus as a sample. We also propose a dataset called All News extracted from 15 news publishers, in which documents are much longer than 20 Newsgroups. We present an amortized variational inference method for GraphBTM. Our method generates more coherent topics compared with previous approaches. Experiments show that the sampling strategy improves performance by a large margin.
Anthology ID:
D18-1495
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4663–4672
Language:
URL:
https://aclanthology.org/D18-1495
DOI:
10.18653/v1/D18-1495
Bibkey:
Cite (ACL):
Qile Zhu, Zheng Feng, and Xiaolin Li. 2018. GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4663–4672, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model (Zhu et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1495.pdf
Code
 valdersoul/GraphBTM