Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis

Agam Shah, Suvan Paturi, Sudheer Chava


Abstract
Monetary policy pronouncements by Federal Open Market Committee (FOMC) are a major driver of financial market returns. We construct the largest tokenized and annotated dataset of FOMC speeches, meeting minutes, and press conference transcripts in order to understand how monetary policy influences financial markets. In this study, we develop a novel task of hawkish-dovish classification and benchmark various pre-trained language models on the proposed dataset. Using the best-performing model (RoBERTa-large), we construct a measure of monetary policy stance for the FOMC document release days. To evaluate the constructed measure, we study its impact on the treasury market, stock market, and macroeconomic indicators. Our dataset, models, and code are publicly available on Huggingface and GitHub under CC BY-NC 4.0 license.
Anthology ID:
2023.acl-long.368
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6664–6679
Language:
URL:
https://aclanthology.org/2023.acl-long.368
DOI:
10.18653/v1/2023.acl-long.368
Bibkey:
Cite (ACL):
Agam Shah, Suvan Paturi, and Sudheer Chava. 2023. Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6664–6679, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis (Shah et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.368.pdf
Video:
 https://aclanthology.org/2023.acl-long.368.mp4