A framework for streamlined statistical prediction using topic models

Vanessa Glenny; Jonathan Tuke; Nigel Bean; Lewis Mitchell

doi:10.18653/v1/W19-2508

A framework for streamlined statistical prediction using topic models

Vanessa Glenny, Jonathan Tuke, Nigel Bean, Lewis Mitchell

Abstract

In the Humanities and Social Sciences, there is increasing interest in approaches to information extraction, prediction, intelligent linkage, and dimension reduction applicable to large text corpora. With approaches in these fields being grounded in traditional statistical techniques, the need arises for frameworks whereby advanced NLP techniques such as topic modelling may be incorporated within classical methodologies. This paper provides a classical, supervised, statistical learning framework for prediction from text, using topic models as a data reduction method and the topics themselves as predictors, alongside typical statistical tools for predictive modelling. We apply this framework in a Social Sciences context (applied animal behaviour) as well as a Humanities context (narrative analysis) as examples of this framework. The results show that topic regression models perform comparably to their much less efficient equivalents that use individual words as predictors.

Anthology ID:: W19-2508
Volume:: Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:: June
Year:: 2019
Address:: Minneapolis, USA
Editors:: Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Venue:: LaTeCH
SIG:: SIGHUM
Publisher:: Association for Computational Linguistics
Note:
Pages:: 61–70
Language:
URL:: https://aclanthology.org/W19-2508/
DOI:: 10.18653/v1/W19-2508
Bibkey:
Cite (ACL):: Vanessa Glenny, Jonathan Tuke, Nigel Bean, and Lewis Mitchell. 2019. A framework for streamlined statistical prediction using topic models. In Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 61–70, Minneapolis, USA. Association for Computational Linguistics.
Cite (Informal):: A framework for streamlined statistical prediction using topic models (Glenny et al., LaTeCH 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-2508.pdf

PDF Cite Search Fix data