Carsten Jentsch


2023

pdf bib
Debunking Disinformation with GADMO: A Topic Modeling Analysis of a Comprehensive Corpus of German-language Fact-Checks
Jonas Rieger | Nico Hornig | Jonathan Flossdorf | Henrik Müller | Stephan Mündges | Carsten Jentsch | Jörg Rahnenführer | Christina Elmer
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments
Kai-Robin Lange | Carsten Jentsch
Proceedings of the 3rd Workshop on Computational Linguistics for the Political and Social Sciences

2021

pdf bib
RollingLDA: An Update Algorithm of Latent Dirichlet Allocation to Construct Consistent Time Series from Textual Data
Jonas Rieger | Carsten Jentsch | Jörg Rahnenführer
Findings of the Association for Computational Linguistics: EMNLP 2021

We propose a rolling version of the Latent Dirichlet Allocation, called RollingLDA. By a sequential approach, it enables the construction of LDA-based time series of topics that are consistent with previous states of LDA models. After an initial modeling, updates can be computed efficiently, allowing for real-time monitoring and detection of events or structural breaks. For this purpose, we propose suitable similarity measures for topics and provide simulation evidence of superiority over other commonly used approaches. The adequacy of the resulting method is illustrated by an application to an example corpus. In particular, we compute the similarity of sequentially obtained topic and word distributions over consecutive time periods. For a representative example corpus consisting of The New York Times articles from 1980 to 2020, we analyze the effect of several tuning parameter choices and we run the RollingLDA method on the full dataset of approximately 4 million articles to demonstrate its feasibility.