Latent Topic Embedding

Di Jiang, Lei Shi, Rongzhong Lian, Hua Wu


Abstract
Topic modeling and word embedding are two important techniques for deriving latent semantics from data. General-purpose topic models typically work in coarse granularity by capturing word co-occurrence at the document/sentence level. In contrast, word embedding models usually work in much finer granularity by modeling word co-occurrence within small sliding windows. With the aim of deriving latent semantics by considering word co-occurrence at different levels of granularity, we propose a novel model named Latent Topic Embedding (LTE), which seamlessly integrates topic generation and embedding learning in one unified framework. We further propose an efficient Monte Carlo EM algorithm to estimate the parameters of interest. By retaining the individual advantages of topic modeling and word embedding, LTE results in better latent topics and word embedding. Extensive experiments verify the superiority of LTE over the state-of-the-arts.
Anthology ID:
C16-1253
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
2689–2698
Language:
URL:
https://aclanthology.org/C16-1253
DOI:
Bibkey:
Cite (ACL):
Di Jiang, Lei Shi, Rongzhong Lian, and Hua Wu. 2016. Latent Topic Embedding. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2689–2698, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Latent Topic Embedding (Jiang et al., COLING 2016)
Copy Citation:
PDF:
https://aclanthology.org/C16-1253.pdf