Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs

Ximing Li, Jinjin Chi, Changchun Li, Jihong Ouyang, Bo Fu


Abstract
Gaussian LDA integrates topic modeling with word embeddings by replacing discrete topic distribution over word types with multivariate Gaussian distribution on the embedding space. This can take semantic information of words into account. However, the Euclidean similarity used in Gaussian topics is not an optimal semantic measure for word embeddings. Acknowledgedly, the cosine similarity better describes the semantic relatedness between word embeddings. To employ the cosine measure and capture complex topic structure, we use von Mises-Fisher (vMF) mixture models to represent topics, and then develop a novel mix-vMF topic model (MvTM). Using public pre-trained word embeddings, we evaluate MvTM on three real-world data sets. Experimental results show that our model can discover more coherent topics than the state-of-the-art baseline models, and achieve competitive classification performance.
Anthology ID:
C16-1015
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
151–160
Language:
URL:
https://aclanthology.org/C16-1015
DOI:
Bibkey:
Cite (ACL):
Ximing Li, Jinjin Chi, Changchun Li, Jihong Ouyang, and Bo Fu. 2016. Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 151–160, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs (Li et al., COLING 2016)
Copy Citation:
PDF:
https://aclanthology.org/C16-1015.pdf