Uncertainty-aware generative models for inferring document class prevalence

Katherine Keith; Brendan O’Connor

doi:10.18653/v1/D18-1487

Uncertainty-aware generative models for inferring document class prevalence

Abstract

Prevalence estimation is the task of inferring the relative frequency of classes of unlabeled examples in a group—for example, the proportion of a document collection with positive sentiment. Previous work has focused on aggregating and adjusting discriminative individual classifiers to obtain prevalence point estimates. But imperfect classifier accuracy ought to be reflected in uncertainty over the predicted prevalence for scientifically valid inference. In this work, we present (1) a generative probabilistic modeling approach to prevalence estimation, and (2) the construction and evaluation of prevalence confidence intervals; in particular, we demonstrate that an off-the-shelf discriminative classifier can be given a generative re-interpretation, by backing out an implicit individual-level likelihood function, which can be used to conduct fast and simple group-level Bayesian inference. Empirically, we demonstrate our approach provides better confidence interval coverage than an alternative, and is dramatically more robust to shifts in the class prior between training and testing.

Anthology ID:: D18-1487
Volume:: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:: October-November
Year:: 2018
Address:: Brussels, Belgium
Editors:: Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:: EMNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4575–4585
Language:
URL:: https://aclanthology.org/D18-1487/
DOI:: 10.18653/v1/D18-1487
Bibkey:
Cite (ACL):: Katherine Keith and Brendan O’Connor. 2018. Uncertainty-aware generative models for inferring document class prevalence. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4575–4585, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Uncertainty-aware generative models for inferring document class prevalence (Keith & O’Connor, EMNLP 2018)
Copy Citation:
PDF:: https://aclanthology.org/D18-1487.pdf

PDF Cite Search Fix data