Exploring Topic Discriminating Power of Words in Latent Dirichlet Allocation

Kai Yang, Yi Cai, Zhenhong Chen, Ho-fung Leung, Raymond Lau


Abstract
Latent Dirichlet Allocation (LDA) and its variants have been widely used to discover latent topics in textual documents. However, some of topics generated by LDA may be noisy with irrelevant words scattering across these topics. We name this kind of words as topic-indiscriminate words, which tend to make topics more ambiguous and less interpretable by humans. In our work, we propose a new topic model named TWLDA, which assigns low weights to words with low topic discriminating power (ability). Our experimental results show that the proposed approach, which effectively reduces the number of topic-indiscriminate words in discovered topics, improves the effectiveness of LDA.
Anthology ID:
C16-1211
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
2238–2247
Language:
URL:
https://aclanthology.org/C16-1211
DOI:
Bibkey:
Cite (ACL):
Kai Yang, Yi Cai, Zhenhong Chen, Ho-fung Leung, and Raymond Lau. 2016. Exploring Topic Discriminating Power of Words in Latent Dirichlet Allocation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2238–2247, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Exploring Topic Discriminating Power of Words in Latent Dirichlet Allocation (Yang et al., COLING 2016)
Copy Citation:
PDF:
https://aclanthology.org/C16-1211.pdf