Principled Analysis of Energy Discourse across Domains with Thesaurus-based Automatic Topic Labeling

Thomas Scelsi, Alfonso Martinez Arranz, Lea Frermann


Abstract
With the increasing impact of Natural Language Processing tools like topic models in social science research, the experimental rigor and comparability of models and datasets has come under scrutiny. Especially when contributing to research on topics with worldwide impacts like energy policy, objective analyses and reliable datasets are necessary. We contribute toward this goal in two ways: first, we release two diachronic corpora covering 23 years of energy discussions in the U.S. Energy Information Administration. Secondly, we propose a simple and theoretically sound method for automatic topic labelling drawing on political thesauri. We empirically evaluate the quality of our labels, and apply our labelling to topics induced by diachronic topic models on our energy corpora, and present a detailed analysis.
Anthology ID:
2021.alta-1.11
Volume:
Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association
Month:
December
Year:
2021
Address:
Online
Editors:
Afshin Rahimi, William Lane, Guido Zuccon
Venue:
ALTA
SIG:
Publisher:
Australasian Language Technology Association
Note:
Pages:
107–118
Language:
URL:
https://aclanthology.org/2021.alta-1.11
DOI:
Bibkey:
Cite (ACL):
Thomas Scelsi, Alfonso Martinez Arranz, and Lea Frermann. 2021. Principled Analysis of Energy Discourse across Domains with Thesaurus-based Automatic Topic Labeling. In Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association, pages 107–118, Online. Australasian Language Technology Association.
Cite (Informal):
Principled Analysis of Energy Discourse across Domains with Thesaurus-based Automatic Topic Labeling (Scelsi et al., ALTA 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.alta-1.11.pdf
Code
 tscelsi/dtm-toolkit +  additional community code