A Semantic Cover Approach for Topic Modeling

Rajagopal Venkatesaramani, Doug Downey, Bradley Malin, Yevgeniy Vorobeychik


Abstract
We introduce a novel topic modeling approach based on constructing a semantic set cover for clusters of similar documents. Specifically, our approach first clusters documents using their Tf-Idf representation, and then covers each cluster with a set of topic words based on semantic similarity, defined in terms of a word embedding. Computing a topic cover amounts to solving a minimum set cover problem. Our evaluation compares our topic modeling approach to Latent Dirichlet Allocation (LDA) on three metrics: 1) qualitative topic match, measured using evaluations by Amazon Mechanical Turk (MTurk) workers, 2) performance on classification tasks using each topic model as a sparse feature representation, and 3) topic coherence. We find that qualitative judgments significantly favor our approach, the method outperforms LDA on topic coherence, and is comparable to LDA on document classification tasks.
Anthology ID:
S19-1011
Volume:
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Venues:
SemEval | *SEM
SIGs:
SIGLEX | SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
92–102
Language:
URL:
https://aclanthology.org/S19-1011
DOI:
10.18653/v1/S19-1011
Bibkey:
Cite (ACL):
Rajagopal Venkatesaramani, Doug Downey, Bradley Malin, and Yevgeniy Vorobeychik. 2019. A Semantic Cover Approach for Topic Modeling. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019), pages 92–102, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
A Semantic Cover Approach for Topic Modeling (Venkatesaramani et al., SemEval-*SEM 2019)
Copy Citation:
PDF:
https://aclanthology.org/S19-1011.pdf