Measuring Topic Coherence through Optimal Word Buckets

Nitin Ramrakhiyani, Sachin Pawar, Swapnil Hingmire, Girish Palshikar


Abstract
Measuring topic quality is essential for scoring the learned topics and their subsequent use in Information Retrieval and Text classification. To measure quality of Latent Dirichlet Allocation (LDA) based topics learned from text, we propose a novel approach based on grouping of topic words into buckets (TBuckets). A single large bucket signifies a single coherent theme, in turn indicating high topic coherence. TBuckets uses word embeddings of topic words and employs singular value decomposition (SVD) and Integer Linear Programming based optimization to create coherent word buckets. TBuckets outperforms the state-of-the-art techniques when evaluated using 3 publicly available datasets and on another one proposed in this paper.
Anthology ID:
E17-2070
Volume:
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Mirella Lapata, Phil Blunsom, Alexander Koller
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
437–442
Language:
URL:
https://aclanthology.org/E17-2070
DOI:
Bibkey:
Cite (ACL):
Nitin Ramrakhiyani, Sachin Pawar, Swapnil Hingmire, and Girish Palshikar. 2017. Measuring Topic Coherence through Optimal Word Buckets. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 437–442, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Measuring Topic Coherence through Optimal Word Buckets (Ramrakhiyani et al., EACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/E17-2070.pdf