Evaluating Topic Quality with Posterior Variability

Linzi Xing, Michael J. Paul, Giuseppe Carenini


Abstract
Probabilistic topic models such as latent Dirichlet allocation (LDA) are popularly used with Bayesian inference methods such as Gibbs sampling to learn posterior distributions over topic model parameters. We derive a novel measure of LDA topic quality using the variability of the posterior distributions. Compared to several existing baselines for automatic topic evaluation, the proposed metric achieves state-of-the-art correlations with human judgments of topic quality in experiments on three corpora. We additionally demonstrate that topic quality estimation can be further improved using a supervised estimator that combines multiple metrics.
Anthology ID:
D19-1349
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3471–3477
Language:
URL:
https://aclanthology.org/D19-1349
DOI:
10.18653/v1/D19-1349
Bibkey:
Cite (ACL):
Linzi Xing, Michael J. Paul, and Giuseppe Carenini. 2019. Evaluating Topic Quality with Posterior Variability. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3471–3477, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Evaluating Topic Quality with Posterior Variability (Xing et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1349.pdf
Attachment:
 D19-1349.Attachment.zip
Code
 lxing532/topic_variability