Causal inference using observational text data is becoming increasingly popular in many research areas. This paper presents the Bayesian Topic Regression (BTR) model that uses both text and numerical information to model an outcome variable. It allows estimation of both discrete and continuous treatment effects. Furthermore, it allows for the inclusion of additional numerical confounding factors next to text data. To this end, we combine a supervised Bayesian topic model with a Bayesian regression framework and perform supervised representation learning for the text features jointly with the regression parameter training, respecting the Frisch-Waugh-Lovell theorem. Our paper makes two main contributions. First, we provide a regression framework that allows causal inference in settings when both text and numerical confounders are of relevance. We show with synthetic and semi-synthetic datasets that our joint approach recovers ground truth with lower bias than any benchmark model, when text and numerical features are correlated. Second, experiments on two real-world datasets demonstrate that a joint and supervised learning strategy also yields superior prediction results compared to strategies that estimate regression weights for text and non-text features separately, being even competitive with more complex deep neural networks.
Estimating the effects of monetary policy is one of the fundamental research questions in monetary economics. Many economies are facing ultra-low interest rate environments ever since the global financial crisis of 2007-9. The Covid pandemic recently reinforced this situation. In the US and Europe, interest rates are close to (or even below) zero, which limits the scope of traditional monetary policy measures for central banks. Dedicated central bank communication has hence become an increasingly important tool to steer and control market expectations these days. However, incorporating central bank language directly as features into economic models is still a very nascent research area. In particular, the content and effect of central bank speeches has been mostly neglected from monetary policy modelling so far. With our paper, we aim to provide to the research community a novel, monetary policy shock series based on central bank speeches. We use a supervised topic modeling approach that can deal with text as well as numeric covariates to estimate a monetary policy signal dispersion index along three key economic dimensions: GDP, CPI and unemployment. This “dispersion shock” series is not only more frequent than series that classically focus on policy announcement dates, it also opens up the possibility of answering new questions that have up until now been difficult to analyse. For example, do markets form different expectations when facing a “cacophony of policy voices”? Our initial findings for the US point towards the fact that more dispersed or incongruent monetary policy stance communication in the build up to Federal Open Market Committee (FOMC) meetings might be associated with stronger subsequent market surprises at FOMC policy announcement time.