Modeling Event-Pair Relations in External Knowledge Graphs for Script Reasoning

Script reasoning infers subsequent events from a given event chain, which involves the ability to understand relations between events. A human-labeled script reasoning dataset is usually of small size with limited event relations, which highlights the necessity to leverage ex-ternal eventuality knowledge graphs (KG) consisting of numerous triple facts to describe the inferential relation between events. Existing methods adopt a retrieval and integration paradigm to focus merely on the graph triples that have event overlap with a script, but ignore much more supportive triples in the KG with similar inferential patterns, leading to under-exploiting. To fully exploit the KG, we pro-pose a knowledge model to learn the inferential relations between events from the whole eventuality KG and then support downstream models by directly capturing the relation be-tween events in a script. We further present a neural script adapter to extend the knowledge model for inferring the associated relations be-tween an event chain and a subsequent event candidate. We evaluate the proposed approach on a popular multi-choice narrative cloze task for script reasoning and achieve new state-of-the-art accuracy, compared with baselines either incorporating external KG or not.


Introduction
Script reasoning (Chambers and Jurafsky, 2008;Li et al., 2018;Lv et al., 2020b) aims at determining the subsequent event or plausible ending for an event chain in a script. For example, a tourism script consist of ["Emily took a plane", "Emily arrived at Oahu", "Emily went to Waimea Bay"], and the subsequent event is more likely to be "Emily surfed" than "Emily skied". Script reasoning has attracted more interest in the natural language processing (NLP) community since it plays essential paradigm (green dot line) with ours (blue dash-dot line). Although there is no semantic overlap between the precedent event in the script and the events in the KG, which leads to failed retrieval, our approach still provides supportive evidence by exploiting similar inferential relation patterns. roles in many real-world applications like storytelling (Swanson and Gordon, 2008).
Understanding and inferring the correlation between two events are critical for script reasoning. Taking the tourism script as an example, the key to decide the subsequent event is inferring that "A person goes to a beach" is more correlated to "The person surfs" than "The person skies". An immediate idea is to learn event relations from some well-labeled training datasets. Unfortunately, due to labor-intensive labeling, high-quality training data for script reasoning is usually small, from which it is impractical to learn rich relations for large scale commercial applications. Therefore, it is necessary to leverage external knowledge that implies relations between events.
Recently, Lv et al. (2020b) propose to leverage a large-scale eventuality knowledge graph (KG), ASER (Zhang et al., 2020), for script reasoning via adopting the "retrieval and integration" paradigm. Given an event chain, this paradigm first retrieves relevant triple facts from the eventuality KG and then integrates them into a script reasoning model.
Although such a paradigm is proven effective in entity-centric tasks (Zhang et al., 2019;, it is not competent in event-centric script reasoning. The reason is that, the retrieval is based on lexical or semantic matching between an event from a script and each event node in the KG. For example, in Figure 1, to determine whether the precedent event "The new toy is not attractive to kids" will result in a subsequent event "It is rejected", this paradigm will try to retrieve graph triples with the event nodes talking about "toy is not attractive", etc., which is very likely to fail if the KG contains few related events. Namely, it dramatically narrows the focus to the graph triples merely with exact event matching, so it cannot fully leverage the external eventuality KG.
However, script reasoning can benefit from leveraging event pairs in KG with similar relation patterns, rather than the only triples in KG with similar events. In Figure 1, although events in the four graph triples have no semantic overlap with the precedent event "The new toy is not attractive to kids", all the triples can represent the relation that if some attribute of an object is judged negatively, it might be rejected, otherwise being accepted, which still provide strong supportive evidence between "The new toy is not ..." and "It is rejected". Therefore, script reasoning can benefit from the event pairs with similar inferential relation patterns, beyond the textual contents of the events.
Motivated by this, in this work, we propose a novel paradigm to integrate external knowledge for script reasoning by directly modeling the relation between events from a KG and thus support script reasoning in light of similar relation patterns. In particular, we first propose a discriminative knowledge model trained on the graph triples in an external eventuality KG. Taking each event pair in the triples as input, the knowledge model learns to predict whether two events in the pair are associated and what is the inferential relation between them. After being trained, the knowledge model can directly capture associated and inferential relations between precedent and subsequent events in a script. And the relations between events will be represented in latent space, which can be further integrated into any event-centric neural model. Furthermore, as script reasoning requires to associate between a sequence of precedent events (i.e., an event chain) and a plausible subsequent event, we propose a neural script adapter, based on a chain-dependent attention module, for extending the trained knowledge model from event to script level. This leads to a script-adaptive knowl-edge model that directly represents inferential information between an event chain and a subsequent event candidate as a latent embedding. Lastly, this embedding, coupled with deep text representation from a script-text contextualizing encoder, is used to derive the plausibility score of the candidate.
We conduct empirical studies on a popular task of script reasoning, i.e., multi-choice narrative cloze (Li et al., 2018). Our approach outperforms strong competitors and achieves a new state-ofthe-art accuracy, verifying the effectiveness of the script-adaptive knowledge model when integrating inferential relations into script reasoning.

Preliminary
This section begins with a formal task definition of script reasoning, followed by introductions to eventuality KGs and pre-trained language models.  (c) . A script reasoning model is asked to produce relatedness score between the event chain and each candidate event so that e = arg max e j P (e, e j ; θ), ∀e j ∈ E (c) , where P (·; θ) denotes a θ-parameterized script reasoning model, andê denotes the predicted event.
Eventuality Knowledge Graph. In contrast to canonical KGs with entity-centric factoid triples, an eventuality KG, G, typically consists of a set of event-centric triples (e (h) , r, e (t) ) to describe inferential or co-occurrent relation between events. It represents each event e as free-form text, while well defines a closed-set R of relations so that ∀r ∈ R.

Methodology
In this section, we will elaborate on our approach for multi-choice narrative cloze (MCNC) task in script reasoning. As shown in Figure 2, we first propose a discriminative knowledge model learning facts from eventuality graph ( §3.1), followed by a novel neural adapter upgrading the knowledge model into script level ( §3.2). Lastly, as in Figure 3, Dash-dot blue rounded rectangles denote parameters optimized towards the objective of the knowledge model, whereas solid blue rounded rectangles denote script adapter's parameters that will be optimized towards the objective of MCNC.
we present a representation learning framework to solve the MCNC task ( §3.3).

Discriminative Knowledge Model
To avoid challenging event grounding and satisfy coverage necessity, neural knowledge models (Bosselut et al., 2019b;Hwang et al., 2020) are proposed to memorize eventuality facts from a KG to its parameters during training. They are built upon a pre-trained generative Transformer (e.g., GPT (Radford et al., 2018)) and fine-tuned on triple facts from an eventuality KG via generative objectives of event-based link prediction.
However, such generative knowledge models are not perfectly compatible when capturing event-pair relation facts since they focus more on inferring tail events given a head event and an inferential relation. This is consistent with the goal of link prediction for KG completion. Consequently, if they try to model the inferential relations between events, they have to generate all possible triples for each event by traversing all relations and enlarging beam-search size (Bosselut et al., 2019a). And the generated triple must be re-encoded into latent space for the integration (Lv et al., 2020b), not to mention generative models not always reliable.
Therefore, we present a discriminative objective based on relation classification for knowledge model learning to directly capture such inferential information in latent space. Formally, given a triple (e (h) , r, e (t) ) ∈ G, we separately pass head event e (h) and tail event e (h) , into a text encoder to generate event-level contextualized representations. Following Devlin et al. (2019), we first concatenate the natural language text w e of each event e with special tokens: where the special tokens could vary with different pre-trained models. Then, we feed the concatenated textw e into a Transformer encoder, followed by a pooling layer, i.e., where v denotes the resulting event representation, Transformer(·; θ) stands for pre-trained bidirectional Transformer encoder (e.g., BERT (Devlin et al., 2019) and RoBERTa ) to produce deep contextualized embeddings, and Pool(·) denotes using the embedding of [CLS] as sequence-level representation by following prior works. Given v of both head and tail events, we apply an interactive concatenation (Bowman et al., 2015;Reimers and Gurevych, 2019) between them to model their inferential relationship, i.e., Here, r ∈ R 4d represents inferential relation between head and tail events, [·; ·] denotes vector concatenation, and "×" denotes element-wise product. Lastly, the relation representation, r, is learned by passing it into a neural classifier to predict the oracle relation in the original triple. In order to enable this knowledge model to represent a null or non-associated relation between events, we define an extra relation category, named dummy relation r (dmy) . This classification is written as where p (rc) is the probability distribution over R , and R denotes a union of the well-defined relation set R with a dummy relation category r (dmy) . The training data corresponding to r (dmy) is derived from negative sampling in the eventuality KG.
Training. We use a cross-entropy loss to optimize this discriminative knowledge model, {θ (km) , θ (rc) }, towards such a dummy-aware relation classification, which is denoted as Inference. The trained knowledge model can be used in three ways summarized as (1) producing event representation by v := Event-Enc(e; θ (km) ) (8) = Pool(Transformer(w e ; θ (km) )), (2) generating relation representation by (9) and (3) deriving a confidence score for whether there is an associated relation between two events: Remark. This discriminative knowledge model learns inferential relations between events in latent space, facilitating event-centric reasoning tasks. But it has its drawbacks like incompetence to autoconstruction, in contrast to the generative knowledge models. Thereby, we argue generative and discriminative knowledge models are complementary to each other with different downstream uses.

Script-Adaptive Knowledge Model
In multi-choice narrative cloze (MCNC), a script reasoning model is asked to capture the relation between an event chain and a subsequent event candidate, however beyond the ability of the proposed knowledge model. To handle the MCNC task, we propose a neural adapter for the event encoder in Eq.(8), making it competent in modeling an event chain. Our goal is that, given a subsequent candidate, we extract the most relevant "event" from an event chain to represent the whole chain. As such, the result is still compatible with high-layer components in our knowledge model.
To this end, we present a chain-dependent attention module which is based on bidirectional chain contexts e = [e 1 , . . . , e n ] queried by a potential subsequent event e (c) . In particular, we first generate event representation for each event by our trained event encoder, i.e., where, v i = Event-Enc(e i ; θ (km) ), Then the embedded event chain, V , positionwisely concatenated with the query event representation v (c) , is passed into a bidirectional long short-term memory (Bi-LSTM) to model rich eventcontextual information of the event chain, i.e., where The resulting U ∈ R d×n is chain-dependent representations of the chain events, MLP(·; θ (rd) ) is responsible for reducing dimensionality. Lastly, a self-attention pooling module (Liu et al., 2016;Lin et al., 2017) is applied to U to get a vector representation of the event chain, i.e., where α = softmax(MLP(U ; θ (sa) )).
Here, α ∈ R n denotes the probability distribution of attention mechanism, which is then applied to chain-dependent event representations U ∈ R d×n by matrix multiplication. As a result, c denotes a chain-dependent event representation extracted from the whole event chain. Intuitively, it can be viewed as the most relevant event from the event chain e to the candidate event e (c) as it is derived from an attention module queried by e (c) . Hence, the derived c is still compatible with the top layers (e.g., interactive concatenation and neural classifier) in the discriminative knowledge model. Note that, the parameters of this neural script adapter, θ (adp) = {θ (bl) , θ (rd) , θ (sa) }, will be learned towards the MCNC objective jointly with other neural components in our script reasoning model, which is detailed in the next section ( §3.3).
In summary, we can define a chain-dependent event encoding module to the above procedures to embed an event chain, i.e., c = Event-Enc (adp) (e, e; θ (km) , θ (adp) ), (16) where e = [e 1 , . . . ] is an event chain and e is a query event. The chain-dependent event representation, c, can be used as an argument to Inter-Concat(·, ·) to model script-level relationship with another event chain or a single event. Thus, the other two inference models in Eq.(9) and Eq.(10) are also adapted to Relation-Model (adp) (·) and Confid (adp) (·) respectively.

Script Reasoning Model
Built upon the discriminative knowledge model and its script adapter, we lastly present our script learning model for multi-choice narrative cloze task. To be specific, given an event chain e = [e 1 , . . . , e n ] and each event e (c) j from the subsequent candidates E (c) , we first pass them into the script-adaptive knowledge model to generate a chain-dependent event representation c j as defined Eq. (16): where ∀e (c) j ∈ E (c) . Based on c j , we can also derive the relation representation r j between the event chain and each candidate, as well as the confidence score p j of the association.
Besides the above rich-relation features from the knowledge model, we also leverage expressively powerful contextualized representations from another pre-trained bidirectional Transformer to fully exploit implicit reasoning knowledge in event texts. Formally, we present a script-text contextualizing encoder that applies the Transformer encoder to a concatenation of the event chain and each subsequent candidates, with special tokens separated: h j = Pool(Transformer(w j ; θ (kf ) )).
To integrate the knowledge from the both models, we present a knowledge gating module for element-wise addition weighted by the association confidence: where, o j ∈ R d is the final vector to represent the relation between the chain and a candidate from two perspectives, and MLP(·; θ (g) ) is responsible for reducing dimensionality from 4d to d. Such a gating module leads to a flexible knowledge integration, which is prone to avoiding redundant, non-associated relation features. Finally, an MLP-based scoring module is defined to calculate a plausibility score given the final relation representation, followed by a softmax to derive predicted distribution: where m = |E (c) |, and p (sr) is the predicted distribution over candidate events E (c) in MCNC.
Inference. We can obtain the most plausible subsequent event from a trained MCNC model bŷ

Experiments
This section begins with a detailed description of our experimental setups on multi-choice narrative cloze (MCNC) task for script reasoning. Then, we conduct quantitative evaluations on the MCNC task, followed by extensive qualitative evaluations, including ablation study, model analysis, case study and error analysis.
Evaluation Metrics. We adopt the official evaluation metric (Li et al., 2018), accuracy (ACC), to measure the performance of the reasoning models.  Table 2: Ablation study of our approach. "w/o chaindependent attention" denotes replacing chain-dependent attention module in our script adapter with mean-pooling, "w/o knowledge gating module" denotes removing confidence score p (ca) of the gating module in Eq.(18), ''w/o script-adapter" denotes ablating both chain-dependent attention and knowledge gating module, and "w/o external knowledge" denotes removing our script-adaptive knowledge model, equivalent to the RoBERTa large baseline.
use Adam optimizer (Kingma and Ba, 2015) to optimize the cross-entropy loss. The learning rate is set to 1 × 10 −5 . The maximum training epoch and batch size are set to 3 and 32. The maximum sequence length and dropout are set to 64 and 0.1. The weight decay and gradient clipping are set to 0.01 and 1.0. We choose the model with the best result on the development set and report the results on the testing set are based on this model. The knowledge model contains 110M parameters, and our reasoning models contain 127M and 359M parameters for the base and large initializations, respectively. Our experiments are conducted on 4 NVIDIA P40 GPUs, and the training time is around 5 hours with RoBERTa base and 9 hours with RoBERTa large .

Main Evaluation
The experimental results of our approach and previous script reasoning works on the Multi-Choice Narrative Cloze (MCNC) task are shown in Table  1. From the table, we can make two observations. First, using external knowledge, especially external event graph knowledge, increases the accuracy of models. For example, the knowledge infusion approach proposed by Lv et al. (2020b) outperforms the RoBERTa model without any knowledge. Second, our approach is superior to the retrieval and integration approach, RoBERTa + Knwl, and achieves new state-of-the-art accuracy (i.e. 63.62% using the RoBERTa large text encoder) on this task. This demonstrates the effectiveness of the proposed script-adaptive knowledge model.

Ablation Study
We conduct an ablation study to investigate the effect of each component of our approach and the results are reported in Table 2. We first investigate the impact of the chain-dependent attention in Eq.(15) by replacing it with mean pooling over all events in the chain, and find that the accuracy of script reasoning is decreased about 1%. Next, we testify our approach without the knowledge gating module, which decreases the accuracy by 0.8%. And the gap becomes 1.4% if both the chaindependent attention and the confidence score is ablated. Finally, we compare our approach with the baseline without any external knowledge included, and the accuracy of script reasoning drops by 2.1%, demonstrating the effectiveness of leveraging external event knowledge by the discriminative knowledge model.

Model Analysis
Impact of Knowledge Model. Our approach leverages the external event KG by directly modeling relations between event pairs. Intuitively, the accuracy of the learned model plays a critical role in script reasoning. Thus, we investigate its impact by assessing the performance of script reasoning with the knowledge models of various accuracy.
The accuracy of knowledge model is evaluated on its dev set from the KG. As shown in Figure 4 (left), we can observe that the accuracy of script reasoning increases with that of the knowledge model, verifying the assumption that integrating external event knowledge via a knowledge model improves the performance of a script reasoning model.

Impact of Event
Distance over Event Association. The script reasoning task requires to predict a subsequent event given an event chain. We investigate how the distance between two events impacts their correlation by analyzing the attention score of various timesteps of precedent events in a script. In particular, for all precedent events which are i-th step before subsequent events, we aggregate their attention scores. The results are plot in Figure 4 (right). The x-axis represents the distance between two events, and y-axis represents the estimated correlation by our model. From the figure we can see  that an event is most highly correlated to its precedent neighbor. The association drops quickly as the distance increases, and it becomes very small when two events are three steps away.

Case Study
As demonstrated in Table 3, we present an example in the test set to compare the retrieval and integration approach and ours. Here the script describes an event chain about accountant, which states that accountants are believed.
To infer the next event, the retrieval and integration paradigm will try to retrieve events with similar lexicon or semantics, e.g. "zakia will believe it", "he is an accountant", etc. However, KG triples containing these events do not capture the relation that if a person is believed, people will consult him/her. Therefore, this approach fails to leverage the KG to make a correct prediction.
In contrast, the KG contains event pairs like "(I need your medical expertise, I need your help on something)", "(you are the expert, I need answer)", whose relation patterns support the relation between "accountants are believed" and "people ask accountants", although there is little overlap between KG events and scrip events. The knowledge model learned from the KG event pairs captures such relation patterns and provides strong support for reasoning in this example, which demonstrates the effectiveness of our approach compared with the retrieval and integration approach.

Error Analysis
Lastly, to analyze the limitations of our model, we investigate the mis-classified examples on the MCNC test set, and summarize two main problems: First, some scripts consist of precedent events which might lead to conflict results. For example, a event chain, ["He disappointed supporters", "He fulfilled promise"], is likely to be associated with two opposite results. The former might be associate with "He lost campaign", while the latter might result in "They backed up his campaign". Such case might confuse the reasoning model.
Second, long-distance dependency between events are difficult to capture. For example, in a tourism script which describes "Emily went to the beach" followed by a long description about the parking problem she met, although "Emily went surfing" is a rational subsequent event, the distance between the two events is too long so it is difficult for a reasoning model to capture such relations.

Related Work
A script (Schank and Abelson, 2013) refers to a kind of structured representation for prototypical sequences of events. Chambers and Jurafsky (2008) formulate a script learning (narrative learning) task and propose statistical models to capture event cooccurrence for subsequent event prediction. Afterwards, the approaches for script reasoning can be categorized into two genres. i.e., event pair modeling (Jans et al., 2012;Pichotta and Mooney, 2014;Granroth-Wilding and Clark, 2016) and event chain modeling (Pichotta and Mooney, 2016;Wang et al., 2017;Lv et al., 2019). But, they still lag far behind humans as the well-labeled training set is usually of small size. In addition, script reasoning is more challenging than traditional NLP tasks and requires models to reason over unobserved events.
With recent developments of large-scale eventuality knowledge graphs (KG) (e.g., ASER (Zhang et al., 2020) and ATOMIC ), an effective remedy is to adopt "retrieval and integration" schema and integrate the inferential facts retrieved from the KG for script reasoning (Lv et al., 2020b). This paradigm is proven effective in both entity-centric and concept-centric tasks, such as relation extraction (Zhang et al., 2019), named entity recognition  and commonsense reasoning (Lin et al., 2019;Lv et al., 2020a), etc. However, this paradigm is not that compatible with event-centric script reasoning since script reason-ing focuses more on the inferential relation between consecutive events in a script rather than the triple facts with exact event matching. What is worse, these eventuality KGs consisting of free-form event usually encounter low knowledge coverage or incompleteness problem (Zhang et al., 2020;Bosselut et al., 2019b), leading to problematic grounding from an event to the nodes in the KG.
To circumvent the coverage problem, Bosselut et al. (2019b) and Hwang et al. (2020) propose to learn a generative knowledge model on existing triples from an eventuality KG, where the triples can be regarded as a seed of knowledge. It ondemand generates subsequent events with a prompt of the observed event and an inferential relation, thus avoiding event grounding and satisfying coverage necessity for a broad spectrum of NLP tasks (Shwartz et al., 2020;Majumder et al., 2020;Paul and Frank, 2020;Ding et al., 2019;Ma et al., 2019). However, such generative knowledge models are not perfectly compatible when capturing inferential relations between events because they focus more on inferring tail events rather than the relations.
In contrast, our method avoids operating merely on the triples that have lexical or semantic overlap with the targeted script, while directly learn the inferential relation patterns on the whole KG. The learned knowledge model can simply capture the relation between events in a script in latent space, benefiting various event-centric reasoning tasks.

Conclusion
In this work, we explore a novel paradigm to integrate an external eventuality knowledge graph into a script reasoning model for multi-choice narrative cloze task. We first identify a major problem affecting the integration for script reasoning. That is, previous works merely retrieve the graph triples that have semantic overlap with the events in a script, but neglect that the triples with similar inferential relation patterns can contribute a lot. We hence propose a knowledge model that learns the patterns on the graph and then provides supportive rich-relation evidence for events in a script. We also present a script adapter to make the knowledge model compatible with script-level reasoning. Built upon these, we finally present a reasoning model and evaluate it on the targeted task. Experimental results demonstrate that, the proposed model delivers new state-of-the-art performance, followed by further analyses to provide comprehensive insights.