ExcavatorCovid: Extracting Events and Relations from Text Corpora for Temporal and Causal Analysis for COVID-19

Timely responses from policy makers to mitigate the impact of the COVID-19 pandemic rely on a comprehensive grasp of events, their causes, and their impacts. These events are reported at such a speed and scale as to be overwhelming. In this paper, we present ExcavatorCovid, a machine reading system that ingests open-source text documents (e.g., news and scientific publications), extracts COVID-19 related events and relations between them, and builds a Temporal and Causal Analysis Graph (TCAG). Excavator will help government agencies alleviate the information overload, understand likely downstream effects of political and economic decisions and events related to the pandemic, and respond in a timely manner to mitigate the impact of COVID-19. We expect the utility of Excavator to outlive the COVID-19 pandemic: analysts and decision makers will be empowered by Excavator to better understand and solve complex problems in the future. A demonstration video is available at https://vimeo.com/528619007.


Introduction
Timely responses from policy makers to mitigate the impact of the COVID-19 pandemic rely on a comprehensive grasp of events, their causes, and their impacts. Since the beginning of the COVID-19 pandemic, an enormous amount of articles are being published every day, that report many events 1 and studies related to COVID. It is very difficult, if not impossible, to keep track of these developing events or to get a comprehensive overview of the temporal and causal dynamics underlying these events.
To aid the policy makers in overcoming the information overload, we developed ExcavatorCovid (or Excavator for short), a system that will ingest open-source text sources (e.g., news articles and scientific publications), extract COVID-19 related events and relations between them, and build a Temporal and Causal Analysis Graph (TCAG). Excavator combines the following NLP techniques: • Extracting events ( §3) for types in our comprehensive COVID-19 event taxonomy ( §2). Each event will have time and location if available in text, allowing analyses targeted at specific times or geographic regions of interest.
• Extracting three types of temporal and causal relations ( §4) between pairs of events.
• Constructing a TCAG ( §5) by assembling all events and relations, to provide a comprehensive overview of the events related to COVID-19 as well as their causes and impacts.
• Supporting trend and correlation analysis of events, via visualizing event popularity time series ( § 6) in the TCAG visualization.
Excavator produces a TCAG that is in a machine-readable JSON format and is also humanunderstandable (visualized via a web-based interactive User Interface), to support varied analytical and decision making needs. We hope that Excavator will aid government agencies in efforts to understand likely downstream effects of political and economic decisions and events related to the pandemic, and respond in a timely manner to mitigate the impact of COVID-19. The benefit of Excavator is realized through a comprehensive visualization of events and how they affect each other. We expect the utility of Excavator to outlive the COVID-19 pandemic: analysts and decision makers will be empowered by Excavator to better understand and solve complex problems in the future.
We first present our COVID-19 event taxonomy, and then we present details about event extraction, causal and temporal relation extraction, measuring event popularity using news text as "quantitative data", and the approach for constructing a TCAG. We then describe the system demonstration, present a quantitative analysis of the extractions, and conclude with recommended use cases.

Building a COVID-19 Event Taxonomy
COVID-19 affects many aspects of our political, economic, and personal lives. A comprehensive analysis requires an event taxonomy that categorizes the events related to COVID-19 in many sectors and domains. We developed a COVID-19 event taxonomy using a hybrid approach of manual curation with automated support: first, we run Stanza (Qi et al., 2020) on a large sample (10%) of the Aylien coronavirus news dataset ( § 7) to tag verb and noun phrases that are likely to trigger events. Second, we represent each phrase as the average of the BERT (Devlin et al., 2019) contextualized embedding vectors of the subwords within each phrase, and then run committee-based clustering (Pantel and Lin, 2002) over the vector representations of the phrases to discover salient clusters. Finally, we review the frequently appearing clusters and define event types related to COVID-19.
The event taxonomy includes 76 event types. Each type comes with a name and a short description. Figure 1 illustrates several branches of the event taxonomy 2 . The events come from a wide range of domains. We also manually added hyponymy relations via is_a links (e.g., COVID-19 is_a Virus) between pairs of event types.

Extracting Events
We developed a neural network model for extracting events defined in the COVID-19 event taxonomy (the event classification stage) and extracting 2 The complete taxonomy is available at https: //github.com/BBN-E/LearnIt/blob/master/ inputs/domains/CORD_19/ontology/covid_ event_ontology.yaml.  Figure (a) shows the architecture of the model, which takes a sequence of words x 1 , x 2 , ..., x n as input and outputs a sequence of tags y 1 , y 2 , ..., y n . Figure (b) and (c) shows an example for each of the two stages. "PolicyInt" is short for "PolicyIntervention". the location and time arguments (the event argument extraction stage), if they are mentioned in text, for each event mention. The structured representation (events with location and/or time) enables analyses of events targeting a specific time or location. Both stages use a BERT-based sequence tagging model. Figure 2(a) shows the model architecture. Given a sequence of tokens as input, the model extracts a sequence of tags, one per each token. We use the commonly used Begin-Inside-Outside (BIO) tags for both event types and event argument role types for the event classification and argument attachment tasks respectively.
Event classification: a sequence tagging model is trained to predict BIO tags of event types such that it identifies the event trigger span as well as the event type. Figure 2(b) shows an example.
tags of argument role types, such that it identifies token spans of event arguments as well as their argument role types, with respect to a trigger that has already been identified in the event classification stage and marked in the input sentence in "< t > ... < /t >". Figure 2(c) shows an example. We run these two models in a pipeline: the event classification model is applied first to find event triggers and classify their types, then the event argument extraction model is applied to find location and time arguments for each event mention.
Training data curation. We use LearnIt rapid customization for event extraction (Chan et al., 2019) to curate a dataset for training the event classification model. Our developer spent about 13 minutes per event type to find, expand, and filter potential event triggers in a held-out 10% of the Aylien coronavirus news corpus. Statistics of the curated data set are shown in Table 1 (we only show the top-10 most frequent event types for brevity). In total, there are 11814 mentions in 7159 sentences.
To train the argument extraction model, we use the related event-argument annotation from the ACE 2005 dataset (Doddington et al., 2004). We focus on location and time arguments 3 and ignore other roles. At decoding time, after extracting the argument mentions for events, we apply the AWAKE (Boschee et al., 2014) entity linking system to resolve each location argument to a canonical geolocation, and use SERIF (Boschee et al., 2005) to resolve each time argument to a canonical time and then convert it to the month level. This allows us to perform analyses of events targeting a specific geolocation or month of interest.

Extracting Temporal and Causal Relations
We develop two approaches for extracting temporal and causal relations: a pattern-based approach and a neural network model. We take the union of the  outputs from both approaches to maximize recall. The list of causal and temporal relations extracted by the systems is shown in Table 2. Our extractors extract relations at the subtype level. However, we decided to merge the subtypes into types because (a) a user survey shows that users prefer to have a simplified definition of causality that only includes "event X causes (positively impacts) event Y" and "X mitigates (reduces/prevents) Y", because finergrained distinctions at sub-type level are difficult and less useful, and (b) merging the subtypes into types improves accuracy to near or above 0.8 as shown in Table 4, comparing to slightly below 0.7 at the sub-type level due to extraction approaches struggling to differentiate between the sub-types.
Pattern-based relation extraction. We applied the temporal and causal relation extraction patterns from LearnIt (Min et al., 2020). A pattern is either a lexical pattern, which is a sequence of words between a pair of events, e.g.,"X leads to Y" 4 , or a proposition pattern, which is the (nested) predicateargument structure that connects the pair of events. For example, "verb:cause[subject=X] [object=Y]" is the proposition counterpart of the lexical pattern "X causes Y".
Neural relation extraction. We developed a mention pooling (Baldini Soares et al., 2019) neural model for causal and temporal relation extraction. Figure 3 shows the model architecture. Taking a sentence in which a pair of event mention spans are marked as input, the model first encodes the sentence with BERT (Devlin et al., 2019) 5 . For each of the left and right event mentions, it then uses average pooling over the BERT contextualized vectors of the words in the span to obtain fixed-dimension vectors V 1 and V 2 as the span representations. It then concatenates the input embeddings V 1 and V 2 with the element-wise difference |V 1 − V 2 | to generate the pair representation . V is passed into a linear layer followed by a softmax layer to make the relation prediction. The model is trained with a blended dataset consisting of the Entities, Events, Simple and Complex Cause Assertion Annotation datasets 6 released by LDC 7 , and 1.5K temporal relation instances generated by applying the Lear-nIt temporal relation extraction patterns to 10,000 sampled Gigaword (Parker et al., 2011) articles.

Constructing a TCAG
We aggregate all extracted events and causal and temporal relations across the corpus to construct a TCAG. The TCAG is visualized in the interactive visualization, in which each node is an event type and each edge is a causal or temporal relation 8 . We use a simple approach to aggregate events: by default, all event mentions sharing the same type are grouped into a single node named by the type; we resort to the UI to allow the user to selectively focus on a specific location and/or time, such that the UI will only show a TCAG involving event mentions and causal relations between pairs of events for the location and/or time of interest.

Measuring Event Popularity through Time
The TCAG only provides a qualitative analysis of the temporal and causal relations between the COVID-related events. It will be more informative if we can measure the popularity of events through time to enable trend analysis (e.g., does lockdown go up or down between January and May, 2020?) 6 The catalog IDs of the LDC datasets are LDC2019E48, LDC2019E61, LDC2019E70, LDC2019E82, LDC2019E83. 7 www.ldc.upenn.edu 8 is_a relations are also added as dashed edges in the TCAG. and correlation analysis (e.g., will a stricter lockdown improve or deteriorate the economy?).
In order to support these analyses, we produce a timeseries of a popularity score for each event type over time (a.k.a., event timeline). Extending our prior work (Min and Zhao, 2019), we define the popularity score for event type e at time t as: N e,t cM t in which N e,t is the frequency of event e at month t.
We calculate the moving average centered at each t with a sliding window of T = 3 months to reduce noise. M t is the total number of articles published in month t. c = 1/500 is a normalizing constant. The raw event frequency counts can be inflated due to the increasing level of media activity. Therefore, we divide the raw counts by cM t to normalize the counts so that they are comparable across different months.

System Demonstration
Datasets. We run Excavator on the following two corpora to produce a TCAG for COVID-19: the first corpus is 1.2 million articles 9 from the Aylien Coronavirus News Dataset 10 , which contains 1.6 million COVID-related articles published between November 2019 and July 2020 that are from ∼440 news sources. We only kept the articles that are published between January and May 2020, since the corpus contains fewer articles in other months. The second corpus is the COVID-19 Open Research Dataset (Wang et al., 2020). It contains coronavirus-related research from PubMed's PMC corpus, a corpus maintained by the WHO, and bioRxiv and medRxiv pre-prints. As of 11/08/2020, it contains over 300,000 scholarly articles. We combine these two corpora because news and research articles are complementary: news are rich in real-world events and are up to date, while analytical articles contain more causal relationships. Therefore, combining them is likely to lead to a more comprehensive analysis and new insights.
Overall statistics of extractions. Excavator extracted 6.2 million event mentions of 59 types. Table 3 shows the event types that appear more than 50,000 times. We randomly sampled 100 event    Table 4.

TCAG Visualization
We developed an interactive visualization of the TCAG. Figure 4 shows a small part of the TCAG centered on the event Lockdown. Each node represents an event type in our COVID event taxonomy for which Excavator is able to extract events and track their popularity scores ( § 6) through time. The three types of relational edges (Causes, Mitigates and Before) are shown in different colors. The size of the nodes and the thickness of the edges indicate the relative frequency of the event types or relations in the log scale, respectively. For example, Figure 4 shows that Death is mentioned more frequently than Lockdown, and the causal relation {Lockdown, Causes, EconomicCrisis} appears more frequently than {Lockdown, Mitigates ("reduces"), AccessToHealthcare}. To support analysis focusing on a single event, we color the focused event in blue, events that cause or precede the focused event in orange, and events that the focused event causes or precedes in green.
Event popularity timeseries visualization For each node (event) in the TCAG visualization, we show its event popularity timeseries visualization on the side. Figure 5 shows 3 screenshots of the 11 Estimated by manually reviewing 40 instances per type Figure 4: A screenshot of a partial TCAG centered on Lockdown. Green, pink, and purple edges shows Cause, Mitigate and Before relations, respectively. Blue, orange and green nodes show the focused node and nodes with incoming and outgoing edges (with respect to the focused node), respectively. event popularity timeseries ( § 6) visualization between January and May 2020 for Lockdown, Eco-nomicCrisis and COVID-19 respectively.

Recommended Use Cases
We describe 3 recommended use cases below. More details are in our demonstration video.
Use case 1: causal and temporal analysis. We can get a panoramic view of the underlying casual and temporal dynamics between events related to COVID from the overall TCAG. We can start by analyzing the causal or temporal relations centered at an event of interest. For example, Figure 4 shows a diverse range of effects and consequences of Lockdown, such as EconomicCrisis (economic), Shortage (supply-chain), FearOrPanic (mental), etc. Interestingly, the graph also reveals surprises such as {Lockdown, Causes, Death}: the UI shows supporting evidence such as "lockdown exacerbates deaths and chronic health problems associated with poverty, ...". Furthermore, the TCAG shows that Lockdown mitigates DiseaseSpread but it also has a negative impact on the Economy, which will inform the decision makers that they will need to understand the economic trade-offs when implementing the Lockdown policy. We can also analyze longer-distance causal path- ways consisting of two or more causal/temporal edges. For example, our demo video shows that COVID-19 causes or precedes (Before) Lockdown, and that Lockdown causes or precedes Economic-Crisis. This helps us understand details about how COVID causes EconomicCrisis.
Use case 2: trend and correlation analysis. We can inspect the event timeline for a node or an edge to perform a trend analysis and a correlation analysis, respectively. Figure 5 shows screenshots of the event popularity timeseries between January and May 2020 for Lockdown, EconomicCrisis and COVID-19. First, the user can click on a single event to perform a trend analysis: the popularity of Lockdown goes up continuously, indicating an upward trend in implementing lockdown policies in more geographic regions. The user can also click on a edge to perform a correlation analysis for a pair of events: when the user clicks on the edge {Lockdown, Causes, EconomicCrisis}, the UI shows a strong correlation between the two upward curves. For another edge "Lockdown mitigates COVID-19", the UI shows a negative correlation near the end: as Lockdown rises, COVID-19 slightly falls towards the end.
Use case 3: analyses targeted at geolocations. The event timeline visualization also allows the user to see the timeline for geolocations such as each U.S. state individually, instead of the aggregate for the entire U.S.. Figure 6 is a screenshot showing the 10 timelines for Lockdown for the top-10 most frequently mentioned U.S. states. The screenshot shows that the curves for California and New York go much higher than other states. This roughly matches the stricter lockdown policies implemented in the two states during this time period, compared with other states. Such targeted analysis is made possible because our events have location and time arguments. We can also make the TCAG only show events and relations for a specific state, if a user selects a state of interest in the UI.
Constructing Causal Graphs from Text. Eidos (Sharp et al., 2019) uses a rule-based approach to extract causal relations to build a causal analysis graph, that has limited coverage on events related to COVID-19. LearnIt (Min et al., 2020) enables rapid customization of causal relation extractors. LearnIt does not focus on causal relations involving COVID-related events. This work also differs from these two in that we extract event arguments and temporal relations, and track event popularity.

Conclusion and Future Work
We present Excavator, a machine reading system that automatically constructs a Temporal and Causal Analysis Graph for COVID-19 by reading open-source text documents such as news and scientific publications. Our next steps are to integrate Modal Dependency Parsing (Yao et al., 2021) for event factuality assessment, and cross-lingual transfer learning (Nguyen et al., 2021) to make Excavator applicable to more languages.