The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction

Event schemas encode knowledge of stereotypical structures of events and their connections. As events unfold, schemas are crucial to act as a scaffolding. Previous work on event schema induction focuses either on atomic events or linear temporal event sequences, ignoring the interplay between events via arguments and argument relations. We introduce a new concept of Temporal Complex Event Schema: a graph-based schema representation that encompasses events, arguments, temporal connections and argument relations. In addition, we propose a Temporal Event Graph Model that predicts event instances following the temporal complex event schema. To build and evaluate such schemas, we release a new schema learning corpus containing 6,399 documents accompanied with event graphs, and we have manually constructed gold-standard schemas. Intrinsic evaluations by schema matching and instance graph perplexity, prove the superior quality of our probabilistic graph schema library compared to linear representations. Extrinsic evaluation on schema-guided future event prediction further demonstrates the predictive power of our event graph model, significantly outperforming human schemas and baselines by more than 17.8% on HITS@1.


Introduction
The current automated event understanding task has been overly simplified to be local and sequential. Real world events, such as disease outbreaks and terrorist attacks, have multiple actors, complex timelines, intertwined relations and multiple possible outcomes. Understanding such events requires knowledge in the form of a library of event schemas, capturing the progress of time, and performing global inference for event prediction. For example, regarding the 2019 protest in Hong Kong International Airport, a typical question from analysts would be "How long will the flights being canceled?" This requires an event understanding system to match events to schema representations and reason about what might happen next. The airport protest schema would be triggered by "protest" and "flight cancellation", and evidence of protesters (e.g., the number of protesters, the instruments being used, etc) will suggest a CEO resignation event, or a flight rescheduling event, or continuous flight cancellation events with respective probabilities. Comprehending such a news story requires following a timeline, identifying key events and tracking characters. We refer to such a "story" as a complex event, e.g., the Kabul ambulance bombing event. Its complexity comes from the inclusion of multiple atomic events (and their arguments), relations and temporal order. A complex event schema can be used to define the typical structure of a particular type of complex event, e.g., carbombing. This leads us to the new task that we address in this paper: temporal complex event schema induction. Figure 1 shows an example schema about car-bombing with multiple temporal dependencies between events. Namely, the occurrence of one event may depend on multiple events. For example, the ASSEMBLE event happens after buying both the bomb materials and the vehicle. Also, there may be multiple events following an event, such as the multiple consequences of the ATTACK event in Figure 1. That is to say, "the future is not one-dimensional". Our automatically induced probabilistic complex event schema can be used to forecast event abstractions into the future and thus provide a comprehensive understanding of evolving situations, events, and trends.
For each type of complex event, we aim to induce a schema library that is probabilistic, temporally organized and semantically coherent. Low level atomic event schemas are abundant, and can be part of multiple, sparsely occurring, higher-level schemas. We propose a Temporal Event Graph Model, an auto-regressive graph generation model, to reach this goal. Given a currently extracted event graph, we generate the next event type node with its potential arguments, such as the ARREST event in Figure 2, and then propagate edge-aware information following temporal orders. After that, we employ a copy mechanism to generate coreferential arguments, such as the DETAINEE argument is the ATTACKER of the previous ATTACK event, and build relation edges for them, e.g., PART WHOLE relation between the PLACE arguments. Finally, temporal dependencies are determined with argument connections considered, such as the temporal edge showing that ARREST is after ATTACK.
Our generative model serves as both a schema library and a predictive model. Specifically, we can probe the model to generate event graphs unconditionally to obtain a set of schemas. We can also pass partially instantiated graphs to the model and "grow" the graph either forward or backward in time to predict missing events, arguments or relations, both from the past and in the future. We propose a set of schema matching metrics to evaluate the induced schemas by comparing with human-created schemas and show the power of the probabilistic schema in the task of future event prediction as an extrinsic evaluation, to predict event types that are likely to happen next.
We make the following novel contributions: • This is the first work to induce probabilistic temporal graph schemas for complex events Symbol Meaning G ∈ G Instance graph of a complex event S ∈ S Schema graph of a complex event type e ∈ E Event node in an instance graph v ∈ V Entity node in an instance graph ei, e l Temporal ordering edge between events ei and e l , indicating ei is before e l ei, a, vj Argument edge, indicating vj plays argument role a in the event ei vj, r, v k Relation edge between entities vj and v k , and r is the relation type A(e) Argument role set of event e, defined by the IE ontology ΦE The type set of events ΦV The type set of entities φ(·) A mapping function from a node to its type

G<i
Subgraph of G containing events before ei and their arguments Table 1: List of symbols across documents, which capture temporal dynamics and connections among individual events through their coreferential or related arguments. • This is the first application of graph generation methods to induce event schemas. • This is the first work to use complex event schemas for event type prediction, and also produce multiple hypotheses with probabilities. • We have proposed a comprehensive set of metrics for both intrinsic and extrinsic evaluations. • We release a new data set of 6,399 documents with gold-standard schemas annotated manually.

Problem Formulation
From a set of documents describing a complex event, we construct an instance graph G which contains event nodes E and entity nodes (argument nodes) V . There are three types of edges in this graph: (1) event-event edges e i , e l connecting events that have direct temporal relations; (2) evententity edges e i , a, v j connecting arguments to the event; and (3) entity-entity edges v j , r, v k indicating relations between entities. We can construct instance graphs by applying Information Extraction (IE) techniques on an input text corpus. In these graphs, the relation edges do not have directions but temporal edges between events are directional, going from the event before to the event after.
For each complex event type, given a set of instance graphs G, the goal of schema induction is to generate a schema library S. In each schema graph S, the nodes are abstracted to the types of events and entities. Figure 1 is an example of schema 2 for complex event type car-bombing. Schema graphs can be regarded as a summary abstraction of instance graphs, capturing the reoccurring structures.

Instance Graph Construction
To induce schemas for a complex event type, such as car-bombing, we construct a set of instance graphs, where each instance graph is about one complex event, such as Kabul ambulance bombing.
We first identify a cluster of documents that describes the same complex event. In this paper, we treat all documents linked to a single Wikipedia page as belonging to the same complex event, detailed in §4.1.
We use OneIE, a state-of-the-art Information Extraction system , to extract entities, relations and events, and then perform crossdocument entity (Pan et al., 2015(Pan et al., , 2017 and event coreference resolution (Lai et al., 2021) over the document cluster of each complex event. We further conduct event-event temporal relation extraction (Ning et al., 2019;Wen et al., 2021b) to determine the order of event pairs. We run the entire pipeline following (Wen et al., 2021a) 3 , and the detailed extraction performance is reported in the paper.
After extraction, we construct one instance graph for each complex event, where coreferential events or entities are merged. We consider the isolated events as irrelevant nodes in schema induction, so they are excluded from the instance graphs during graph construction. Considering schema graphs focus on type-level abstraction, we use type label and node index to represent each node, ignoring the mention level information in these instance graphs.

Temporal Event Graph Model Overview
Given an instance graph G , we regard the schema as the hidden knowledge to guide the generation of these graphs. To this end, we propose a temporal event graph model that maximizes the probability of each instance graph, parameterized by G∈G p(G). At each step, based on the previous graph G <i , we predict one event node e i with its arguments to generate the next graph G i , We factorize the probability of generating new nodes and edges as: As shown in Figure 2, an event node e i is generated first according to the probability p(e i |G <i ). We then add argument nodes based on the IE ontology. We also predict relation v j , r, v k between the newly generated node v j and the existing nodes v k ∈ G <i . After knowing the shared and related arguments, we add a final step to predict the temporal relations between the new event e i and the existing events e l ∈ G <i .
In the traditional graph generation setting, the order of node generation can be arbitrary. However, in our instance graphs, event nodes are connected through temporal relations. We order events as a directed acyclic graph (DAG). Considering each event may have multiple events both "before" and "after", we obtain the generation order by traversing the graph using Breadth-First Search.
We also add dummy START/END event nodes to indicate the starting/ending of the graph generation. At the beginning of the generation process, the graph G 0 has a single start event node e [SOG] . We generate e [EOG] to signal the end of the graph.

Event Generation
To determine the event type of the newly generated event node e i , we apply a graph pooling over all events to get the current graph representation g i , We use bold to denote the latent representations of nodes and edges, which will be initialized as zeros and updated at each generation step via message passing in § 3.4. We adopt a mean-pooling operation in this paper. After that, the event type is predicted through a fully connected layer, .
Once we know the event type of e i , we add all of its arguments in A(e i ) defined in the IE ontology as new entity nodes. For example, in Figure 2, the new event e i is an ARREST event, so we add three argument nodes for DETAINEE, JAILOR, and PLACE respectively. The edges between these arguments and event e i are also added into the graph.

Edge-Aware Graph Neural Network
We use a Graph Neural Network (GNN) (Kipf and Welling, 2017) to update node embeddings following the graph structure. Before we run the GNN on the graph, we first add virtual edges between the newly generated event and all previous events, and between new entities and previous entities, shown as dashed lines in Figure 2. The virtual edges enable the representations of new nodes to aggregate the messages from previous nodes, which has been proven effective in (Liao et al., 2019).
To capture rich semantics of edge types, we pass edge-aware messages during graph propagation. An intuitive way is to encode different edge types with different convolutional filters, which is similar to RGCN (Schlichtkrull et al., 2018). However, the number of RGCN parameters grows rapidly with the number of edge types and easily becomes unmanageable given the large number of relation types and argument roles in the IE ontology. 4 Instead, we learn a vector representation for each relation type r and argument role a. The message passed through each argument edge e i , a, v j is: where denotes concatenation operation. Similarly, the message between two entities v j and v k is: Considering that the direction of the temporal edge is important, we parametrize the message over this edge by assigning two separate weight matrices to the outgoing and incoming vertices: We aggregate the messages using edge-aware attention following (Liao et al., 2019): 5 where σ is the sigmoid function, and MLP contains two hidden layers with ReLU nonlinearities.
The event node representation e i is then updated using the messages from its local neighbors N (e i ), similar to entity node representations:

Coreferential Argument Generation
After updating the node representations, we detect the entity type of each argument, and also predict whether the argument is coreferential to existing entities. Inspired by copy mechanism (Gu et al., 2016), we classify each argument node v j to either a new entity with entity type φ(v j ), or an existing entity node in the previous graph G <i . For example, in Figure 2, the DETAINEE should be classified to the existing ATTACKER node, while JAILOR node is classified as PERSON. Namely, where p( e i , a j , v j , g|e i , a j ) is the generation probability, classifying the new node to its entity type φ(v j ): The copy probability p( e i , a j , v j , c|e i , a j ) selects the coreferential entity v from the entities in existing graph, denoted by V <i , Here, Z is the shared normalization term, If determined to copy, we merge coreferential entities in the graph.

Entity Relational Edge Generation
In this phase, we determine the virtual edges to be kept and assign relation types to them, such as PARTWHOLE relation in Figure 2. We model the relation edge generation probability as a categorical distribution over relation types, and add [O] (OTHER) to the typeset R to represent that there is no relation edge: We use two hidden layers with ReLU activation functions to implement the MLP.

Event Temporal Ordering Prediction
To predict the temporal dependencies between the new events and existing events, we connect them through temporal edges, as shown in Figure 2. These edges are critical for message passing in predicting the next event. We build temporal edges in the last phase of generation, since it relies on the shared and related arguments. Considering that temporal edges are interdependent, we model the generation probability as a mixture of Bernoulli distributions following (Liao et al., 2019): where B is the number of mixture components. When B = 1, the distribution degenerates to factorized Bernoulli, which assumes the independence of each potential temporal edge conditioned on the existing graph.

Training and Schema Decoding
We train the model by optimizing the negative loglikelihood loss, To compose the schema library for each complex event scenario, we construct instance graphs from related documents to learn a graph model, and then obtain the schema using greedy decoding.

Dataset
We conduct experiments on two datasets for both the general scenario and a more specific scenario. We adopt the DARPA KAIROS 6 ontology, a newly defined fine-grained ontology for Schema Learning, with 24 entity types, 46 relation types, 67 event types, and 85 argument roles. 7 Our schema induction method does not rely on any specific ontology, only the IE system is trained on a given ontology to create the instance event graphs. General Schema Learning Corpus: The Schema Learning Corpus, released by LDC (LDC2020E25), includes 82 types of complex events, such as Disease Outbreak, Presentations and Shop Online.
Each complex event is associated with a set of source documents. This data set also includes ground-truth schemas created by LDC annotators, which were used for our intrinsic evaluation.  IED Schema Learning Corpus: The same type of complex events may have many variants, which depends on the different types of conditions and participants. In order to evaluate our model's capability at capturing uncertainty and multiple hypotheses, we decided to dive deeper into one scenario and chose the improvised explosive device (IED) as our case study. We first collected Wikipedia articles that describe 4 types of complex events, i.e., Car-bombing IED, Drone Strikes IED, Suicide IED and General IED. Then we followed (Li et al., 2021) to exploit the external links to collect the additional news documents with the corresponding complex event type. The ground-truth schemas for this IED corpus are created manually, through a schema curation tool (Mishra et al., 2021). Only one human schema graph was created for each complex event type, resulting in 4 schemas. In detail, for each complex event type, we presented example instance graphs and the ranked event sequences to annotators to create human (ground truth) schemas. The event sequences are generated by traversing the instance graphs, and then sorted by frequency and the number of arguments. Initially we assigned three annotators (IE experts) to each create a version of the schema and then the final schema was merged through discussion. After that, two annotators (linguists) performed a two-pass revision. Human curation focuses on merging and trimming steps by validating them using the reference instance graphs. Also, temporal dependencies between steps were further refined, and coreferential entities and their relations were added during the curation process. To avoid bias from the event sequences, linguists in the second round revision were not presented with the event sequences. All annotators were trained and disagreements were resolved through discussion.

Schema Matching Evaluation
We compare the generated schemas with the ground truth schemas based on the overlap between them. The following evaluation metrics were employed: 8 Event Match: A good schema must contain the events crucial to the complex event scenario. Fscore is used to compute the overlap of event nodes. Event Sequence Match: A good schema is able to track events through a timeline. So we obtain event sequences following temporal order, and evaluate F-score on the overlapping sequences of lengths l = 2 and l = 3. Event Argument Connection Match: Our complex event graph schema includes entities and their relations and captures how events are connected through arguments, in addition to their temporal order. We categorize these connections into three categories: (1) two events are connected by shared arguments; (2) two events have related arguments, i.e., their arguments are connected through entity relations; (3) there are no direct connections between two events. For every pair of overlapped events, we calculate F-score based on whether these connections are predicted correctly. The human schemas of the General dataset do not contain arguments and the relations between arguments, so we only compute this metric for the IED dataset.

Instance Graph Perplexity Evaluation
To evaluate our temporal event graph model, we compute the instance graph perplexity by predicting the instance graphs in the test set, . (1) We calculate the full perplexity for the entire graph using Equation (1), and event perplexity using only event nodes, emphasizing the importance of correctly predicting events.

Schema-Guided Event Prediction
To explore schema-guided probabilistic reasoning and prediction, we perform an extrinsic evaluation of event prediction. Different from traditional event prediction tasks, the temporal event graphs contain arguments with relations, and there are type labels assigned to nodes and edges. We create a graph-based event prediction dataset using our testing graphs. The task aims to predict ending events of each graph, i.e., events that have no future events after it. An event is predicted correctly if its event type matches one of the ending events in the graph. Considering that there can be multiple ending events in one instance graph, we rank event type prediction scores and adopt MRR (Mean Reciprocal Rank) and HITS@1 as evaluation metrics.

Experiment Setting
Baseline 1: Event Language Model (Rudinger et al., 2015;Pichotta and Mooney, 2016) is the state-of-the-art event schema induction method. It learns the probability of temporal event sequences, and the event sequences generated from event language model are considered as schemas.
Baseline 2: Sequential Pattern Mining (Pei et al., 2001) is a classic algorithm for discovering common sequences. We also attach arguments and their relations as extensions to the pattern. Considering that the event language model baseline cannot handle multiple arguments and relations, we add sequential pattern mining for comparison. The frequent patterns mined are considered as schemas.
Reference: Human Schema is added as a baseline in the extrinsic task of event prediction. Since human-created schemas are highly accurate but not probabilistic, we want to evaluate their limits at predicting events in the extrinsic task. We match schemas to instances and fill in the matched type. Ablation Study: Event Graph Model w/o Argument Generation is included as a variant of our model in which we remove argument generation ( §3.5 and §3.6). It learns to generate a graph containing only event nodes with their temporal relations, aiming to verify whether incorporating argument information helps event modeling.

Implementation Details
Training Details. For our event graph model, the representation dimension is 128, and we use a 2-layer GNN. The value of B is 2. The number of mixture components in temporal classifier is 2. The learning rate is 1e-4. To train event language model baseline, instead of using LSTM-based architecture following (Pichotta and Mooney, 2016), we adopt the state-of-the-art auto-regressive language XLNet . In detail, we first linearize the graph using topological sort, and then train XLNet 9 using the dimension of 128 (the same as our temporal event graph model), and the number of layers is 3. The learning rate is 1e-4. We select the best model on the validation set. Both of our model and event language model baseline are trained on one Tesla V100 GPU with 16GB DRAM. For sequential pattern mining, we perform random walk, starting from every node in instance graphs and ending at sink nodes, to obtain event type sequences, and then apply PrefixSpan (Pei et al., 2001) 10 to rank sequential patterns. Evaluation Details. To compose the schema library, we use the first ranked sequence as the schema for these two models. To perform event prediction using baselines, we traverse the input graph to obtain event type sequences, and conduct prediction on all sequences to produce an averaged score. For human schemas, we first linearize them and the input graphs, and find the longest common subsequence between them.

Results and Analysis
Intrinsic Evaluation. In Table 3, the significant gain on event match demonstrates the ability of our graph model to keep salient events. On sequence match, our approach achieves larger performance gain compared to baselines when the path length l is longer. It implies that the proposed model is capable of capturing longer and wider temporal dependencies. In the case of connection match, only sequential pattern mining in the baselines can predict connections between events. When compared against sequential pattern mining, our generation model significantly performs better since it considers the inter-dependency of arguments and encodes them with graph structures. Extrinsic Evaluation. On the task of schemaguided event prediction, our graph model obtains significant improvement (see Table 4.) The low performance of human schema demonstrates the importance of probabilistically modeling schemas to support downstream tasks. Take Figure 3 as an example. Human schemas produce incorrect event types such as TRAILHEARING, since it matches the sequence ATTACK→DIE→TRAILHEARING, incapable of capturing the inter-dependencies between sequences. However, our model is able to customize the prediction to the global context of the input   graph, and take into account that there is no AR-REST event or justice-related events in the input graph. Also, the human schema fails to predict INJURE and ATTACK, because it relies on the exact match of event sequences of lengths l ≥ 2, and cannot handle the variants of sequences. This problem can be solved by our probabilistic schema, via modeling the prediction probability conditioned on the existing graph. For example, even though AT-TACK mostly happens before DIE, we learn that ATTACK might repeat after DIE event if there are multiple ATTACK and DETONATE in the existing graph, which means the complex event is about a series of conflict events. Ablation Study. Removing argument generation ("w/o ArgumentGeneration") generally lowers the performance on all evaluation tasks, since it ignores the coreferential arguments and their relations, but relies solely on the overly simplistic temporal order to connect events. This is especially apparent from the instance graph perplexity in Table 3.

Learning Corpus
Size. An average of 113 instance graphs is used for each complex event type in the IED scenario, and 383 instance graphs to learn the schema model in the General scenario. The better performance on the IED dataset in Table 3 shows that the number of instance graphs increases the schema induction performance. Effect of Information Extraction Errors. Based on the error analysis for schemas induced in Table  1, the effect of extraction errors can be categorized into: (1) temporal ordering errors: 43.3%; (2) missing events: 34.4%; (3) missing coreferential events: 8.8%; (4) incorrect event type: 7.7%; (5) missing coreferential arguments: 5.5%. However, even on automatically extracted event graphs with extraction errors, our model significantly performs better on event prediction compared to humanconstructed schemas, as shown in Table 4. It demonstrates that our schema induction method is robust and effective to support downstream tasks, even when only provided with noisy data with extraction errors.

Related Work
The definition of a complex event schema separates us from related lines of work, namely schema induction and script learning. Previous work on schema induction aims to characterize event triggers and participants of individual atomic events (Chambers, 2013;Cheung et al., 2013;Nguyen et al., 2015;Sha et al., 2016;Yuan et al., 2018), ignoring inter-event relations. Work on script learning, on the other hand, originally limited attention to event chains with a single protagonist (Chambers andJurafsky, 2008, 2009;Rudinger et al., 2015;Jans et al., 2012;Granroth-Wilding and Clark, 2016) and later extended to multiple participants Mooney, 2014, 2016;Weber et al., 2018). Recent efforts rely on distributed representations encoded from the compositional nature of events (Modi, 2016;Granroth-Wilding and Clark, 2016;Weber et al., 2018Weber et al., , 2020, and language modeling (Rudinger et al., 2015;Pichotta and Mooney, 2016;Peng and Roth, 2016). All of these methods still assume that events follow linear order in a single chain. They also overlook the relations between participants which are critical for understanding the complex event. However, we induce a comprehensive event graph schema, capturing both the temporal dependency and the multi-hop argument dependency across events.
Recent work on event graph schema induction  only considers the connections between a pair of two events. Similarly, their event prediction task is designed to automatically generate a missing event (e.g., a word sequence) given a single or a sequence of prerequisite events (Nguyen et al., 2017;Hu et al., 2017;Li et al., 2018b;Kiyomaru et al., 2019;Lv et al., 2019), or predict a pre-condition event given the current events (Kwon et al., 2020). In contrast, we leverage the automatically discovered temporal event schema as guidance to forecast the future events.
Existing script annotations (Chambers andJurafsky, 2008, 2010;Wanzare et al., 2016;Mostafazadeh et al., 2016a,b;Kwon et al., 2020) cannot support a comprehensive graph schema induction due to the missing of critical event graph structures, such as argument relations. Furthermore, in real-world applications, complex event schemas are expected to be induced from large-scale historical data, which is not feasible to annotate manually. We propose a data-driven schema induction approach, and choose to use IE systems instead of using manual annotation, to induce schemas that are robust and can tolerate extraction errors.

Conclusions and Future Work
We propose a new task to induce temporal complex event schemas, which are capable of representing multiple temporal dependencies between events and their connected arguments. We induce such schemas by learning an event graph model, a deep auto-regressive model, from the automatically extracted instance graphs. Experiments demonstrate the model's effectiveness on both intrinsic evaluation and the downstream task of schema-guided event prediction. These schemas can guide our understanding and ability to make predictions with respect to what might happen next, along with background knowledge including location-, and participant-specific and temporally ordered event information. In the future, we plan to extend our framework to hierarchical event schema induction, as well as event and argument instance prediction.