Trade the Event: Corporate Events Detection for News-Based Event-Driven Trading

In this paper, we introduce an event-driven trading strategy that predicts stock movements by detecting corporate events from news articles. Unlike existing models that utilize textual features (e.g., bag-of-words) and sentiments to directly make stock predictions, we consider corporate events as the driving force behind stock movements and aim to profit from the temporary stock mispricing that may occur when corporate events take place. The core of the proposed strategy is a bi-level event detection model. The low-level event detector identifies events' existences from each token, while the high-level event detector incorporates the entire article's representation and the low-level detected results to discover events at the article-level. We also develop an elaborately-annotated dataset EDT for corporate event detection and news-based stock prediction benchmark. EDT includes 9721 news articles with token-level event labels as well as 303893 news articles with minute-level timestamps and comprehensive stock price labels. Experiments on EDT indicate that the proposed strategy outperforms all the baselines in winning rate, excess returns over the market, and the average return on each transaction.


Introduction
By shaping investors' perceptions and assessments of companies, financial news has significant impacts on the stock market (Engle and Ng, 1993;Tetlock, 2007). News-based stock prediction models are thus developed to automatically discover signals of stock market movements from the countless news articles that generated every moment. (Kalyani et al., 2016;Shah et al., 2018;Mohan et al., 2019). Previous studies mainly rely on textual features and sentiment analysis to forecast the stock movements (Fung et al., 2003;Liu, 2018;Huynh et al., 2017). Both of them, however, often face the problem of poor explainability and low signal-to-noise ratio. Textual feature-based methods often formulate the stock prediction as a text classification problem by directly predicting the rise and fall of stocks based on the extracted features. These models fail to make reasonable trading decisions since they omit the reasons behind stock price changes. Sentiment-based methods avoid this problem by regarding the news articles' sentiments as the indicator of stock movement. However, news sentiment is subjective, which can be greatly affected by the author's standpoint and writing style.
Unlike textual features and sentiments, corporate events are objective facts that impact how investors perceive and assess the related companies. Thus, we resort to corporate events to make more convincing and explainable stock predictions. Jacobs et al. (2018) achieves corporate events detection by splitting a news article into sentences and detecting events on each of them with multi-label sentence classification. This method, however, discards the global contextual information of the entire article and fails to indicate the evidence of events' existences. We believe that detecting events at a smaller granularity (e.g., at the token-level) is beneficial on both model training and application sides. During training, explicitly assigning a label to each token gives the model specific guidelines of what to identify. On the application side, each detected event is supported by one or more subsequences of the original article, allowing users to easily distinguish the predicted results.
However, singly detecting events from the tokenlevel may result in a lack of macro understandings of the entire article. To tackle this, we introduce a bi-level event detection model, in which a low-level detector identifies the subsequences that describe specific events by classifying each token. And a high-level detector takes the predicted results from low-level and integrates them with the input article's global contextual information to predict the probabilities of each event's existence.
Another problem with existing models is that they ignore the timeliness of news articles. Most of them utilize news articles to predict the related securities' rise/fall on the following trading day(s). However, stock prices are very likely to change immediately in response to noteworthy news. Thus, the stock movement in the following trading day(s) may not accurately reflect a news article's influence. To tackle this, we make stock predictions as soon as a news article is published and perform tradings at that moment with the proposed trading policies.
Based on the event detection model and trading policies, we construct an event-driven trading strategy that simultaneously detects corporate events from news articles, indicates the subsequences that describe the detected events, and performs trading on the related stocks. By running the strategies against massive historical data in EDT, we demonstrate the superiority of the proposed strategy in terms of excess returns over the market and the average return on each transaction. The experiment results also reveal the timeliness of news and the effectiveness of corporate events in indicating stock movement.
The main contributions of our paper are as summarized follows: (i) We introduce a novel eventdriven trading strategy that detects trading signals from arbitrary unlabeled news articles; (ii) We present EDT, an elaborately-annotated dataset with 300000+ news articles for corporate event detection and news-based trading benchmark. (iii) We propose a bi-level event detection model that integrates macro and fine-grained understandings to effectively identify corporate events;

Problem Definition
We aim to construct an event-driven trading strategy that automatically detects corporate events from news articles and performs trading accordingly. The proposed strategy consists of two components: bi-level event detector and trading policy.
The low-level event detector identifies events from each token. We formulate the low-level event detection as a sequence labeling problem. The label set L = {e 1 , e 2 , ..., e k , O} consists of k pre-defined events and a special label O that stands for Noevent 2 . We define an article x = (x 1 , x 2 , ..., x n ) as a sequence of tokens and define its label sequence as y = (y 1 , y 2 , ..., y n ). The same event may be mentioned multiple times in an article, and a single article may contains multiple events. If a subsequence x = (x t , x t+1 , ..., x t+s ) of x describes the event i, {y j } t+s j=t are labeled as e i . All the other words are labeled as O. The lowlevel event detection is defined as follows: given an article x * = (x 1 , x 2 , ..., x n ), find its best label sequence y * = (y 1 , y 2 , ..., y n ). We say event i is detected by the low-level detector if e i ∈ y * . Based on the low-level detected results, the highlevel event detector calculates the probability of each event's existence. We say event i is detected by the high-level detector if its existence probability is larger than a given threshold. We combine the predictions on both levels as the final prediction. When events are detected, the trading policy decides when to buy and sell the related securities.

Strategy
This section discusses the proposed strategy by respectively introducing the event detector and the trading policies.

Event Detector
Before training the model to detect events, we first equip it with prerequisite knowledge of the financial domain by performing a domain adaptation.

Low-level Detector
High-level Detector Figure 2: Model overview. The low-level detector identifies corporate events from each token, while the highlevel detector summarizes the low-level predictions and the input representation to detect events at the articlelevel

Domain Adaptation
Since the same event can be expressed with significantly different terms and descriptions, and the same terms can refer to different meanings, understanding the event itself and its related terms is of great importance. We perform domain adaptation by training the model with Masked-Language-Model (MLM) loss on a financial encyclopedia as well as financial news articles. Section 4.3 discusses this corpus in details. During training, 15% tokens of an input sequence are masked, and the model is asked to predict the masked tokens. The prediction is essentially a multi-class classification over the entire vocabulary. We optimize the model with the Categorical CrossEntropy loss calculated on all the masked tokens.

Bi-Level Event Detection
As shown in fig. 2, the event detection model takes an article as input and respectively detects events from two levels. Each article is concatenated with a special token [CLS]. The Transformer-based text encoder calculates a series of hidden states for each token. We consider [CLS]'s last hidden state h cls as the representation of the entire article, and h i as the representations of token i. The low-level detector identifies the subsequences that describe corporate events. For a given label set L (section 2) with K + 1 labels (e.g., K pre-defined events and a "Noevent"), the detector assigns K + 1 scores to each token by performing a multi-classes classification based on the token's representation. Each score stands for the confidence of finding a specific event at this position. These scores are concatenated and passed to the high-level detector. During training, we calculate the Categorical CrossEntropy loss for each token of an article and average them as the article's lowlevel loss.
The high-level detector concatenates the lowlevel prediction as well as the entire article's representation to calculate the probability of each event's existence. We formulate it as a multi-label classification problem. Specifically, we assign K binary labels to each article and represent them with a Klength label vector. For articles without any events, their label vectors are all-zero. If event i occurs in an article, the i-th component of its label vector is set to 1. We utilize the Binary CrossEntropy with Sigmoid as the loss function. This function considers the K-label classification as K independent binary classification problems. Specifically, it uses the Sigmoid function to map each vector component of the high-level detector's output to (0, 1). We consider the mapped score of each event as its probability of existing in the input article. We then calculate the Binary CrossEntropy loss between the mapped score and the binary label of each event. The losses of all the events are summed as the high-level detector's loss. We sum the losses of the low-level and high-level detectors to simultaneously optimize the detectors and the text encoder.

Ticker Recognizer
To make the proposed strategy applicable to arbitrary news articles, besides detecting events, we also recognize the related securities (e.g., company) to trade on.
Each security listed on an exchange has a unique ticker, which is a unique arrangement of characters (e.g., Amazon's ticker at the NASDAQ exchange is AMZN). To recognize the tickers, we download company-ticker pairs (e.g., Amazon v.s. AMZN) for all the securities listed on NYSE and NASDAQ from Yahoo 3 . For a given article, we perform string matching between the article and all the companyticker pairs. If multiple company-ticker pairs are matched, we choose the one that occurs the most times. Some tricks are employed to improve the accuracy and efficiency. For example, the company-ticker pairs that match the title's first few words are assigned higher confidence. Although a single article may include multiple securities, we only recognize the one that occurs most to simplify the setting.

Trading Policy
To minimize other factors' influences, in this paper, we trade only on stocks instead of any derivatives (e.g., options). We relate each detected event to a long or short trading signal singly based on its event type. Events that may result in a stock price rise are considered as long signals, while events that may lead to a fall of stock prices are considered as short signals.
We implement two trading policies named Trade-At-End and Trade-At-Best. Both of them long (e.g., buy) the related stocks for long signals and perform short-selling for short signals. We define a transaction as a buy(sell) and a sell(buy) of the same stock. The policies always start a transaction (e.g., perform a buy or a short-selling) at the first available time when an event is detected.
Trade-At-End (k): This policy holds a started transaction for k trading days and closes the transaction on the last day when the market closes.
Trade-At-Best (k): This policy closes a transaction at the best price (e.g., highest for sell and lowest for buy) within k trading days from the start date. It estimates the profit that a trader can gain at most within k trading days with a detected event.

Data
In this section, we discuss the EDT dataset. EDT contains data for three purposes: 1. corporate event detection (section 4.1); 2. news-based trading strategy benchmark (section 4.2); 3. financial domain adaptation (section 4.3).

Data for Event Detection
We choose 11 types of corporate events that have relatively predictable and straightforward impacts on the stock price based on financial knowledge.

Event Type
In this work, we only focus on non-periodic corporate events. Trading on periodic corporate events such as Earning Call is much trickier since investors can access their information from multiple sources in advance. We leave them for future works.
Guidance Increase (GI) Guidance is a company's public estimates of its upcomingquarter/fiscal year earnings. This event includes the announcements of guidance increase or upgrade.
Acquisition (A) An acquisition event happens when a company announces to purchase all or a portion of another company's shares or assets.
New Contract (NC) The new contract event refers to a company announcement of being awarded a new contract.
Stock Split (SS) A stock split event refers to a company that divides the existing shares of its stock into multiple new shares.
Reverse Stock Split (RSS) This is the reverse process of the stock split, which consolidates the number of existing stock shares into fewer shares.
Positive Clinical Trial & FDA Approval (CT) This event includes (i) positive trial results from clinical studies; (ii) receiving FDA approval, clearance, or being granted by FDA to market legally in the United States.
Stock Repurchase (SR) A company's stock repurchase events include declaring, reinstatement, or increasing a stock buyback plan.
Dividend (RD) The dividend is a distribution of some of a company's profits paid to its shareholders.
Dividend Cut (DC) A dividend cut means to reduce, stop, or suspend a pre-announced dividend.
Dividend Increase (DI) A dividend increase refers to an increase in the regular dividend.
Special Dividend (SD) A special dividend is an event that a company declares a non-recurring dividend paid to its shareholders.  We collect 9721 English news articles, of which 2266 articles contain at least one of the above events. Table 1 shows the number of articles that corresponds to each event. The rest 7455 articles are news articles that do not contain any of the above events. We expect them to help the model better distinguish the event-related articles from the non-event ones. Among them, we deliberately include hundreds of non-event articles that are highly similar to event-related ones. An example here is "Apple announces a stock repurchase program" v.s. "Apple announces the completion of the recently announced stock repurchase program", in which we do not expect the latter one to have a significant influence on the stock price.
These news articles are downloaded from PRNewswire 4 , Businesswire 5 and Globe-Newswire 6 using keywords-search. The keywords for each event are manually determined based on samples of that event. Each article's title, subtitle, and main text are concatenated after data cleaning (e.g., remove special symbols). We annotate each article with token-level labels. Two human annotators are asked to independently mark the subsequences that best describe the pre-defined corporate events. The annotations of an article are produced if the annotators give the same result. Otherwise, they discuss the best annotations.
We randomly sample 80% articles of each event and combine them with 80% of non-event articles to form the training data. The rest are considered as the validation data.

Data for Strategy Evaluation
We develop this data to benchmark news-based stock prediction models and trading strategies. To accurately account for the stock movement, the news articles should be original-sourced. To mimic the real-world situation, the news articles should be diverse enough (e.g., in different categories). Thus, we choose PRNewswire and Businesswire as the article collection sources and download all the English news from PRNewswire (Mar 1st, 2020 -Apr 30th, 2021) and Businesswire (Aug 16th, 2020 -May 6th, 2021). We remove the articles that exist in the training data (section 4.1.2). Since some pre-defined events are infrequent (e.g., stock split), to ensure that there are at least a few samples of every event, we add all the articles of the validation data (section 4.1.2) to this data. After data cleaning, there are 303893 news articles.
Each article comes with a minute-level timestamp, which allows researchers to locate the exact event happening time. Generally, news-based trading strategy evaluation involves four steps. (i) Identify trading signals (e.g., corporate events or sentiments) from news articles; (ii) For each article where trading signals are detected, recognize the related company(ticker); (iii) Get the recognized company's stock price data around the publish time of the news; (iv) Perform transactions based on trading policies.
To enable researchers without ticker recognizers and historical stock price data to benchmark their models/strategies, we assign each article with an automatically recognized ticker as well as detailed price labels of that ticker. With the detailed price labels of each news article, strategy evaluation can be as easy as "counting the price changes on the articles that are recognized as trading signals".
Specifically, an article's price labels includes: open / close prices at the first minute we can trade on, highest / lowest prices in the following 1/2/3 trading days, close prices in the following 1/2/3 trading days, and the minute-level timestamp corresponding to each price. If available, we take the prices in the pre-market and after-hours into consideration since many corporate events are announced in these periods, and stock prices may change greatly during these times. The price labels are empty for articles where no ticker is recognized or the historical price data is unavailable. Among all the evaluation data, 106619 articles come with non-empty price labels. 7

Data for Domain Adaptation
The corpus for domain adaptation contains financial news articles and a financial terms encyclopedia, which is considered as unstructured domain knowledge. For encyclopedia, we download 6260 explanatory documents from Investopedia 8 . Each document explains a specific financial term and describes the role it may play in the financial market. For news articles, we directly utilize all the news articles of the training data (section 4.1.2).

Experiment
In this section, we first exhaustively compare Bilevel Detection with baselines under different settings. Then, we discuss how each event contributes to the overall trading results. Lastly, we analyze the profitability and practicality of the proposed strategy in real-world stock trading.
Among the 11 corporate events in the EDT dataset, we do not trade on the Dividend since we do not consider it to have a significant influence on stock prices. Among the rest, we consider Reverse Stock Split and Dividend Cut to have negative influences on the stock price, while the others to have positive influences. We evaluate the performance of the proposed strategy with backtesting. Backtesting is widely used to evaluate a trading strategy's effectiveness by running the strategy against historical data. We perform trade on all the detected trading signals for each model.
Metrics For a "buy" transaction, we define its return as P sell −P buy P buy %, while for a "short-selling", we define its return as P sell −P buy P sell %. Here, P stands for the price. If a transaction's return is greater than or equal to 0, we call it a "win". If a transaction's return is greater than or equal to 1%, we call it a "big win". For each model, we calculate its winning rate, big win rate (rate of big wins among all the transactions) and average return on each transaction. We also evaluate the models' excess returns over the market, where we consider the S&P 500 index as the benchmark of the market performance. The market return is estimated as the return of buying S&P 500 index ETF for $10000 on Mar 1st, 2020 and sell all of them on May 6th, 2021 9 . For each model, we start with $10000 cash and invest $2000 to each trading signal. When available cash is less than $2000, we invest 20% of available cash to the detected signal. We report the excess returns of each model, which equals to a model's total returns minus the market return. 10 We assume there is a 0.3% commission fee on each transaction.

Model Hyperparameters
We employ the pretrained BERT (Devlin et al., 2018) model as the text encoder. Specifically, we use the bert-base-cased checkpoint. Both the low-level and high-level detectors consist of a hidden layer and an output layer. There are 2048 hidden units in the hidden layer. We utilize AdamW optimizer with batch size 32 and learning rate 5e-5 to train the encoder and detectors together for 5 epochs. We set the maximum input length as 256 since we find almost all the events mentioned in a news article exist in its first 256 9 In accordance to the time span of data for evaluation 10 Since Trade-At-Best always finishes the transaction at the best price, its winning rate is always 100% and its total returns is almost linearly related to the number of transactions. Thus, we only report the average return of this policy. tokens. Training of the model costs 15 minutes on 4 Nvidia RTX 2080Ti GPUs. We conduct each experiment with 3 different random seeds and report the average results.
Trading Details For Trade-At-End, we execute a stop loss of 20% (e.g., sell a stock immediately when it falls 20%). In all the experiments, we only trade on the news articles where the historical price data of the detected ticker is available at the minute when the article is published. In other words, we ignore all the news articles that are not published during the market hours and articles where the historical price data is incomplete.

Baseline
Vader (Gilbert, 2014) is a rule-based sentiment analysis model that assigns positive, negative, and neutral scores to an article. We consider news articles with a positive score greater than 0.2 as long trading signals.
BERT-SST is a BERT-based (Devlin et al., 2018) sentiment analysis model trained on the Stanford sentiment treebank (SST) dataset. We respectively consider news articles with a positive score greater than 0.995 and 0.9 as long trading signals to reduce the threshold's influences on the final results.
Sentence (Jacobs et al., 2018) splits an article into sentences and performs sentence-level event detection based on multi-label text classification. It was original implemented with SVM and LSTM. We re-implement it with BERT to compare it fairly with our models. We split each article into sentences with the NLTK toolkit (Loper and Bird, 2002) to train and evaluate the model. (Lafferty et al., 2001) was originally proposed as a Conditional Random Fields-based sequence labeling model, which combines emission scores given by BERT and learned transition scores to find the global optimal label sequence for each input. We re-implement it to perform event detection singly on the token-level. Following the literature, we use different learning rates for the CRF(1e-3) and the BERT(3e-5) components.

Main Results
Table 2 and 3 respectively shows the models' 1day and 2-day trading results. The result of 3-day trading is consistent. Due to space limitations, we present it in appendix A. As shown in the tables,   our model outperforms all the baselines on average return and exceed return under all the settings.

Results of Ticker Recognizer
To evaluate the performance of ticker recognizer, we manually label tickers for 1674 news articles. The proposed ticker recognizer succeeds in 1643 of them (accuracy: 98.15%). Although it obtains a satisfactory performance, its imperfect recognition may slightly impair the evaluation of trading strategies, since it may point out the incorrect securities for the strategies to trade on.

Results of Trade-At-End
Even with the simple trading policy, our model achieves an average return of 1.74% and an exceed return of $84443 (844%) in 1-day trading. Experiments on Vader and BERT-SST show that the sentiment of a news article can indicate the stock movement to some extend. For example, BERT-SST achieves great winning rates, and it successfully outperforms the market index by a considerable margin. However, these signals tend to results in small stock movements. Thus, sentimentbased models achieve poor average returns. On the other hand, event-based models obtain much higher average returns, demonstrating the superiority of corporate events over news sentiments in indicating stock movements.
Although the Sentence model highly outperforms the sentiment-based methods, compared to our models, it is less robust to ambiguous articles, and it is more likely to miss the events that are described in several continuous sentences. By utilizing global context information and detecting events from the token level, our model identifies more trading signals and avoids more potential traps.
Bi-level Detection also achieves better performance than BERT-CRF under all the settings. The improvements mainly come from the high-level detector. By combining the global contextual information with token-level detected results, Bi-level Detection is more robust and more effective. When the low-level detector generates false alarms on some ambiguous tokens or fails to detect events that are not explicitly described, the high-level detector may point it out after analyzing the meaning of the entire article.   pared to other event-based models, Bi-level Detection achieves much better performance in identifying corporate events. Different events also affect the stock prices in different periods. We measure the influence by comparing TAB (1) and TAB (3)'s average return on the same event. They achieve close average returns on Acquisition, Clinical Trial, and Stock Repurchase, indicating that the stock prices do not continue to change sharply after the first day. In contrast, Stock Split impacts the stock price for a more extended period.

Event Analysis
Thus, an ideal trading strategy should take the above factors into account. For example, it may assign different weights to each event based on their profitability and use a different policy to trade each event based on their potential price change patterns.

Profitability in Real-world
In this section, we discuss the possible profitability of the proposed strategy in real-world trading. Backtesting against historical data shows that the proposed strategy dramatically outperforms the market index. However, this result is based on two main assumptions.
First, we assume the cost of time in acquiring the news articles and making trading decisions is almost 0. Table 5 indicates the significance of timeliness, in which Bi-level Detection -Open starts transactions with the Open price at the minute that the news article is published (e.g., price at 11:23:00), while Bi-level Detection -Close starts with the Close price at that minute (e.g., price at 11:23:59). As shown in the table, when the model trades tens of seconds after the publish time of the news article, it greatly underperforms the market index and achieves a negative average return on each transaction. These results demonstrate that the profitability of an event-based model highly depends on how "quick" one can perform tradings after a piece of news is published.
Second, we assume we can always buy/sell the desired amount of stock shares and ignore the liquidity of the stocks. When the investment scale is relatively small, this assumption doesn't have a big impact. However, as the investment scales up, the liquidity may greatly constrain the model's profitability.

Related Works
Text-base Stock Prediction Existing methods usually count on textual features and sentiment analysis to forecast the stock movements (Hagenau et al., 2013;Mohan et al., 2019;Xie and Jiang, 2019;Huynh et al., 2017;Liu, 2018;Mittal and Goel). Hagenau et al. (2013) utilizes N-Gram, Noun-phrases, and 2-words combinations of corporate announcements to predict the stock movement. The influence of financial news on the stock market is also widely explored (Engle and Ng, 1993;Tetlock, 2007). In recent years, researchers resort to support vector machine and deep neural network to analyze financial news articles (Liu, 2018;Xie and Jiang, 2019;Ding et al., 2015;Huynh et al., 2017). Mohan et al. (2019) combines news text (e.g., sentiment, tf-idf, and word2vec representation) with historical stock data to predict future stock prices. Event-based stock predictions are also introduced. Ding et al. (2015) extract events from news articles, calculate event embedding, and use it to predict direction of stock moves. Ben Ami and Feldman (2017) build sentiment-type trading signals with word polarities and event-type trading signals with existing information extraction platform and demonstrates the superiority of event over sentiment in making trading decisions.
Event Detection General domain event detection that aims to recognize structured schemata/frames from the text has been widely explored by data-driven supervised learning methods (Ahn, 2006;Mitamura et al.). In the economic domain, however, existing approaches (Arendarenko and Kakkonen, 2012;Hogenboom et al., 2013;Xie et al., 2013) usually exploit knowledge-based and rule-based methods, which require extensive hand-designed rules and ontology. Recent works conceptualize corporate events as sequences of text that reported company-related occurrences and introduce data-driven methods to solve financial event detection with text classification (Ein-Dor et al., 2019;Jacobs et al., 2018). Ein-Dor et al.
(2019) explored a Wikipedia-based supervised method to detect the sentences that may include corporate events. Jacobs et al. (2018) propose a multi-label sequence classification model to detect specific corporate events from news articles.

Conclusion
This paper introduces an event-driven trading strategy based on corporate event detection from news articles. We introduce a bi-level event detection model that utilizes global and local information to identifies corporate events. Experiments on the presented dataset EDT demonstrate the proposed model's superiority over all the baselines. The results also signify the corporate event's timeliness and effectiveness in indicating stock movement.
In future work, we plan to explore more on both the event detection model and trading policy. We expect to involve external knowledge and few-shot learning methods to relieve the event detection model from the data imbalance and data-scarce scenarios. On the trading policy side, we aim to explore more types of events and customize different policies for each event based on the potential price change patterns it may lead to.