A Bayesian Topic Model for Human-Evaluated Interpretability

Justin Wood, Corey Arnold, Wei Wang


Abstract
One desiderata of topic modeling is to produce interpretable topics. Given a cluster of document-tokens comprising a topic, we can order the topic by counting each word. It is natural to think that each topic could easily be labeled by looking at the words with the highest word count. However, this is not always the case. A human evaluator can often have difficulty identifying a single label that accurately describes the topic as many top words seem unrelated. This paper aims to improve interpretability in topic modeling by providing a novel, outperforming interpretable topic model Our approach combines two previously established subdomains in topic modeling: nonparametric and weakly-supervised topic models. Given a nonparametric topic model, we can include weakly-supervised input using novel modifications to the nonparametric generative model. These modifications lay the groundwork for a compelling setting—one in which most corpora, without any previous supervised or weakly-supervised input, can discover interpretable topics. This setting also presents various challenging sub-problems of which we provide resolutions. Combining nonparametric topic models with weakly-supervised topic models leads to an exciting discovery—a complete, self-contained and outperforming topic model for interpretability.
Anthology ID:
2022.lrec-1.674
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6271–6279
Language:
URL:
https://aclanthology.org/2022.lrec-1.674
DOI:
Bibkey:
Cite (ACL):
Justin Wood, Corey Arnold, and Wei Wang. 2022. A Bayesian Topic Model for Human-Evaluated Interpretability. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6271–6279, Marseille, France. European Language Resources Association.
Cite (Informal):
A Bayesian Topic Model for Human-Evaluated Interpretability (Wood et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.674.pdf