Ross Koval
2024
Learning to Compare Financial Reports for Financial Forecasting
Ross Koval
|
Nicholas Andrews
|
Xifeng Yan
Findings of the Association for Computational Linguistics: EACL 2024
Public companies in the US are required to publish annual reports that detail their recent financial performance, present the current state of ongoing business operations, and discuss future prospects. However, they typically contain over 25,000 words across all sections, large amounts of industry and legal jargon, and a high percentage of boilerplate content that does not change much year-to-year. These unique characteristics present challenges for many generic pretrained language models because it is likely that only a small percentage of the long report that reflects salient information contains meaningful signal about the future prospects of the company. In this work, we curate a large-scale dataset of paired financial reports and introduce two novel, challenging tasks of predicting long-horizon company risk and correlation that evaluate the ability of the model to recognize cross-document relationships with complex, nuanced signals. We explore and present a comprehensive set of methods and experiments, and establish strong baselines designed to learn to identify subtle similarities and differences between long documents. Furthermore, we demonstrate that it is possible to predict company risk and correlation solely from the text of their financial reports and further that modeling the cross-document interactions at a fine-grained level provides significant benefit. Finally, we probe the best performing model through quantitative and qualitative interpretability methods to reveal some insight into the underlying task signal.
2023
Forecasting Earnings Surprises from Conference Call Transcripts
Ross Koval
|
Nicholas Andrews
|
Xifeng Yan
Findings of the Association for Computational Linguistics: ACL 2023
There is a multitude of textual data relevant to the financial markets, spanning genres such as financial news, earnings conference calls, and social media posts. Earnings conference calls are one of the most important to information flow as they reflect a direct communication between company executives, financial analysts, and large shareholders. Since these calls contain content that is forward-looking in nature, they can be used to forecast the future performance of the company relative to market expectations. However, they typically contain over 5,000 words of text and large amounts of industry jargon. This length and domain-specific language present problems for many generic pretrained language models. In this work, we introduce a novel task of predicting earnings surprises from earnings call transcripts and contribute a new long document dataset that tests financial understanding with complex signals. We explore a variety of approaches for this long document classification task and establish some strong baselines. Furthermore, we demonstrate that it is possible to predict companies’ future earnings surprises from solely the text of their conference calls with reasonable accuracy. Finally, we probe the models through different interpretability methods and reveal some intuitive explanations of the linguistic features captured that go beyond traditional sentiment analysis.
Search