2019
pdf
bib
abs
Cross-referencing Using Fine-grained Topic Modeling
Jeffrey Lund
|
Piper Armstrong
|
Wilson Fearn
|
Stephen Cowley
|
Emily Hales
|
Kevin Seppi
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Cross-referencing, which links passages of text to other related passages, can be a valuable study aid for facilitating comprehension of a text. However, cross-referencing requires first, a comprehensive thematic knowledge of the entire corpus, and second, a focused search through the corpus specifically to find such useful connections. Due to this, cross-reference resources are prohibitively expensive and exist only for the most well-studied texts (e.g. religious texts). We develop a topic-based system for automatically producing candidate cross-references which can be easily verified by human annotators. Our system utilizes fine-grained topic modeling with thousands of highly nuanced and specific topics to identify verse pairs which are topically related. We demonstrate that our system can be cost effective compared to having annotators acquire the expertise necessary to produce cross-reference resources unaided.
pdf
bib
abs
Automatic Evaluation of Local Topic Quality
Jeffrey Lund
|
Piper Armstrong
|
Wilson Fearn
|
Stephen Cowley
|
Courtni Byun
|
Jordan Boyd-Graber
|
Kevin Seppi
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Topic models are typically evaluated with respect to the global topic distributions that they generate, using metrics such as coherence, but without regard to local (token-level) topic assignments. Token-level assignments are important for downstream tasks such as classification. Even recent models, which aim to improve the quality of these token-level topic assignments, have been evaluated only with respect to global metrics. We propose a task designed to elicit human judgments of token-level topic assignments. We use a variety of topic model types and parameters and discover that global metrics agree poorly with human assignments. Since human evaluation is expensive we propose a variety of automated metrics to evaluate topic models at a local level. Finally, we correlate our proposed metrics with human judgments from the task on several datasets. We show that an evaluation based on the percent of topic switches correlates most strongly with human judgment of local topic quality. We suggest that this new metric, which we call consistency, be adopted alongside global metrics such as topic coherence when evaluating new topic models.
2018
pdf
bib
abs
Labeled Anchors and a Scalable, Transparent, and Interactive Classifier
Jeffrey Lund
|
Stephen Cowley
|
Wilson Fearn
|
Emily Hales
|
Kevin Seppi
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
We propose Labeled Anchors, an interactive and supervised topic model based on the anchor words algorithm (Arora et al., 2013). Labeled Anchors is similar to Supervised Anchors (Nguyen et al., 2014) in that it extends the vector-space representation of words to include document labels. However, our formulation also admits a classifier which requires no training beyond inferring topics, which means our approach is also fast enough to be interactive. We run a small user study that demonstrates that untrained users can interactively update topics in order to improve classification accuracy.
2017
pdf
bib
abs
Why ADAGRAD Fails for Online Topic Modeling
You Lu
|
Jeffrey Lund
|
Jordan Boyd-Graber
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Online topic modeling, i.e., topic modeling with stochastic variational inference, is a powerful and efficient technique for analyzing large datasets, and ADAGRAD is a widely-used technique for tuning learning rates during online gradient optimization. However, these two techniques do not work well together. We show that this is because ADAGRAD uses accumulation of previous gradients as the learning rates’ denominators. For online topic modeling, the magnitude of gradients is very large. It causes learning rates to shrink very quickly, so the parameters cannot fully converge until the training ends
pdf
bib
abs
Tandem Anchoring: a Multiword Anchor Approach for Interactive Topic Modeling
Jeffrey Lund
|
Connor Cook
|
Kevin Seppi
|
Jordan Boyd-Graber
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Interactive topic models are powerful tools for those seeking to understand large collections of text. However, existing sampling-based interactive topic modeling approaches scale poorly to large data sets. Anchor methods, which use a single word to uniquely identify a topic, offer the speed needed for interactive work but lack both a mechanism to inject prior knowledge and lack the intuitive semantics needed for user-facing applications. We propose combinations of words as anchors, going beyond existing single word anchor algorithms—an approach we call “Tandem Anchors”. We begin with a synthetic investigation of this approach then apply the approach to interactive topic modeling in a user study and compare it to interactive and non-interactive approaches. Tandem anchors are faster and more intuitive than existing interactive approaches.
2016
pdf
bib
abs
Fast Inference for Interactive Models of Text
Jeffrey Lund
|
Paul Felt
|
Kevin Seppi
|
Eric Ringger
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Probabilistic models are a useful means for analyzing large text corpora. Integrating such models with human interaction enables many new use cases. However, adding human interaction to probabilistic models requires inference algorithms which are both fast and accurate. We explore the use of Iterated Conditional Modes as a fast alternative to Gibbs sampling or variational EM. We demonstrate superior performance both in run time and model quality on three different models of text including a DP Mixture of Multinomials for web search result clustering, the Interactive Topic Model, and M OM R ESP , a multinomial crowdsourcing model.
2015
pdf
bib
Is Your Anchor Going Up or Down? Fast and Accurate Supervised Topic Models
Thang Nguyen
|
Jordan Boyd-Graber
|
Jeffrey Lund
|
Kevin Seppi
|
Eric Ringger
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies