Complex Event Schema Induction with Knowledge-Enriched Diffusion Model

The concept of a complex event schema per-tains to the graph structure that represents real-world knowledge of events and their multi-dimensional relationships. However, previous studies on event schema induction have been hindered by challenges such as error propagation and data quality issues. To tackle these challenges, we propose a knowledge-enriched discrete diffusion model. Specifically, we distill the abundant event scenario knowledge of Large Language Models (LLMs) through an object-oriented Python style prompt. We incorporate this knowledge into the training data, enhancing its quality. Subsequently, we employ a discrete diffusion process to generate all nodes and links simultaneously in a non-auto-regressive manner to tackle the problem of error propagation. Additionally, we devise an entity relationship prediction module to complete entity relationships between event arguments. Experimental results demonstrate that our approach achieves outstanding performance across a range of evaluation metrics. 1


Introduction
Event schema induction aims to summarize common patterns and structures from historical events.Current studies mainly induce the atomic schema for each independent event type and their arguments separately (e.g."Attack" event with the arguments: "Attacker", "Target", "Instrument" and "Place"), without considering the correlation between events (Chambers and Jurafsky, 2008;Chambers, 2013;Nguyen et al., 2015).However, some real-world events are usually very complex, consisting of multiple events and their relations.For example in Figure 1, Bombing is a complex event, which involves some fine-grained events, such as Assemble, Detonate and Injure.Therefore, some Raw text two police officers arrived at the area.They and three other responding officers subsequently evacuated homes in the area and called in reinforcements, while a sixth officer stayed on the street to redirect pedestrians.The vehicle exploded at 6:30am.Eight people were treated at hospitals for injuries and later discharged.At least three vehicles burned after the bombing, at least 41 businesses were damaged, and one building located across the street, away from the site of the bombing, collapsed.researchers attempt to study the complex event schema induction task, which abstracts typical structures for complex events from event data.Figure 1 illustrates an example of the complex event schema induction process for the scenario of Bombing.Initially, an information extraction (IE) tool (Du et al., 2022) is utilized to extract instance graphs from raw texts.Subsequently, we induce the event schema based on these extracted instance graphs.The resulting event schema is represented as a graph, where events are interconnected through temporal links (e.g., Damage occurs after Detonate) and their argument relations (e.g., the target of the Detonate event assumes the victim role in the subsequent Injure event).
However, inducing complex event schema is nontrivial.As shown in Figure 1, it necessitates the model's ability to summarize the events within instance graphs and possess a profound understanding of the multi-dimensional relationships between these events.Recently, graph-based methods are proposed for this task by utilizing graph generation techniques (Li et al., 2021;Jin et al., 2022).For example, Li et al. (2021) proposes an auto-regressive generation method that generates the schema following event temporal order.Similarly, Jin et al. (2022) leverages an auto-encoder to encode the global skeleton information and decode the schema graph event by event.Despite successful efforts, these methods still face two critical challenges: Knowledge Coverage of Instance Graphs: The event schema induction task summarizes the instance graphs to obtain the event schema.Thus, the quality of the instance graphs is crucial for the event schema induction.However, the instance graphs are extracted via Information Extraction (IE) tools (Rui et al., 2022), whose knowledge coverage is very limited.For example, as the representative IE tool, RESIN (Wen et al., 2021) is trained on fixed datasets and can only extract predefined types of entities and events.Besides, the extraction performance of RESIN is also unsatisfactory, which only achieves approximately 64% of F1-score for event detection on the ACE dataset.It indicates that the IE tool is difficult to extract complete instance information, even for predefined event types.Therefore, how to improve the knowledge coverage of instance graphs is an important problem.
Error Propagation of Auto-regressive Decoding: Previous graph-based approaches are based on the auto-regressive generation manner (Li et al., 2021;Jin et al., 2022), generating the entire event schema graph node by node, which may lead to error accumulation over time and therefore degrade the generation performance.For example, in Figure 1, the model may mistakenly generate "Injure" instead of "Detonate" leading to the omission of subsequent events such as "Damage" and "Investigate" in the generated schema or resulting in incorrect nodes being generated in the next.The final generated event schema graph will consist of dozens of nodes and edges at the minimum, as each instance graph used for training contains an average of 117 event nodes and 246 temporary links accord-ing to our statistics on the Suicide-IED dataset (Li et al., 2021).The need to generate so many nodes and edges will inevitably exacerbate the problem of error accumulation.Thus, it is essential to address the error propagation problem during schema graph generation.
In this paper, we propose a novel method termed as Knowledge-Enriched Diffusion Model (KDM) to address aforementioned problems.Firstly, to improve the knowledge coverage of instance graphs, we devise a Instance Graph Expansion module.As Large Language Models (LLMs) are trained on vast corpora of texts (Touvron et al., 2023;Zhao et al., 2023;Wang et al., 2023) and therefore possess extensive event and entity knowledge of the real world, we leverage the LLMs (Chowdhery et al., 2022;Ouyang et al., 2022) as the knowledge databases to inject knowledge into instance graphs.The module utilizes a Python style object-oriented prompt to extract event knowledge from LLMs, and adds the knowledge into the instance graphs.Secondly, to tackle error propagation of auto-regressive decoding, we propose an Event Skeleton Generation module, which utilizes discrete diffusion model to predict all nodes and links simultaneously in non-auto-regressive manner but not generates individually based on time series, which alleviate the error propagation problem (Austin et al., 2021;Yang et al., 2023;Vignac et al., 2022).Finally, we devise an Entity Relation Prediction module, which expands the event skeleton with corresponding arguments and predicts their relations to get a complete schema.
The contributions of our work include: (1) We propose a Knowledge-Enriched discrete Diffusion Model (KDM) for complex event schema induction task.To the best of our knowledge, we are the first to simultaneously utilize LLMs and diffusion models to accomplish the task.(2) To improve knowledge coverage of instance graphs, we propose an Instance Graph Expansion module, which distillates the event knowledge in LLMs with python code-style prompt.To solve the error propagation problem, we design an event skeleton generation module, which predicts all nodes and links simultaneously.(3) We conduct extensive experiments on three widely used datasets.Experimental result indicates that our proposed method outperforms state-of-the-art baselines.

Graph
Neural network

Uniform distribution
Figure 2: The discrete diffusion process.In forward process, the noise changes the types of nodes and edges.
2 Preliminaries and Problem Formulation

Preliminaries
Discrete diffusion model preserves the discrete characteristics of each element in the training data x 0 , it introduces noise to each element x r 0 ∈ x 0 to into the uniform distribution and reverses them by removing the noise (Austin et al., 2021).Figure 2 shows the process of graph-based discrete diffusion.
The forward process.This process progressively adds noise to x 0 by transition probability matrix Q t at t step. where indicates the probability of transition from x t−1 = i to x t = j.The forward process gradually converts each x r 0 ∈ x 0 to a uniform distribution when T is large enough.
The reverse process.Reverse process p θ with learnable parameters θ aims to convert the noise distribution x T back to the original x 0 : and according to Bayes formula as follows: Therefore, the task becomes predicting p θ (x 0 |x t ) using a neural network.

Problem Formulation
In the instance graphs about a specific topic y (e.g., Bombing), nodes represent events and entities, while edges have three types: the temporal link, the argument link and the entity relation link.
The instance graph is denoted as G = (N , E), We define the distribution of graph as G = (N , E), where node set N is sampled from node feature distribution N ∈ R n×a and edge set E is sampled from edge feature distribution E ∈ R n×n×b .
Here, c(p) represents the one-hot vector (category scalar) sampled based on probability distribution p.At time t, the instance graph is defined as The objective of this task is to learn an event schema S y from a set of instance graphs D y = {G(1), G(2), . . ., G(m)} that belong to the same specific topic.

Our Approach
To solve the complex event schema induction task, we propose a Knowledge-Enriched Discrete Diffusion Model, as shown in Figure 3.Our method mainly consists of three modules: (1) Instance Graph Expansion, which expands the instance graphs using the complex event knowledge obtained from LLMs atomically.(2) Event Skeleton Induction, which summarizes the event evolution skeleton using a discrete diffusion model.(3) Entity Relation Prediction, which decorates the arguments to the event skeleton and then use a simple graph transformer to predict the entity relations.We will illustrate each component in detail.

Instance Graph Expansion
In this section, we will illustrate how to obtain knowledge about event schemas from LLMs and inject them into instance graphs.Complex event schemas involve intricate graph structures, while LLMs are good at processing unstructured language tasks.To retain structured information of instance graphs, we need LLMs to be able to handle structured inputs and outputs.Considering the powerful coding capabilities of LLMs, we treat events as Python objects.In detail, events, entities, and their intricate relations can correspond to classes, attributes, and instances in the object-  oriented paradigm, respectively.This module includes three aspects: event knowledge expansion, temporal relation expansion, and entity relation expansion.
In event knowledge expansion, we select frequently occurring event sequences from the training instance graphs and write them into Python classes.Then we ask the LLM to enrich Python code.In this way, we will obtain new classes which represent new events that have a high correlation with the scenario.We filter out new events that occur less frequently than a hyperparameter K and are not in the predefined event categories.In temporal relation expansion, we write the obtained events as multiple-choice questions to establish their temporal relation with existing event sequences.In entity relation expansion, we obtain the argument connection relation between the new and existing events by encoding new events into Python code and instantiating the class.By effectively leveraging the complex event knowledge contained in LLMs, our approach enhances the event schema generation process.The details and examples refer to Appendix E.

Event Skeleton Generation
In this section, we will introduce the forward and reverse processes of discrete diffusion based on the instance graphs through Instance Graph Expansion module.We have adopted the diffusion framework of Vignac et al.(2022) and made improvements based on it.Here, we denote the distribution G at time t as G t .
The forward diffusion process.In this process, we apply noise separately on each node and edge.This is achieved by multiplying the node and edge distributions with the transition probability matrix Q.By doing so, we can obtain the graph G t from the previous graph G t−1 .Mathematically, this can be expressed as: Where where 1 is a column vector of all ones, and α t varies from 1 to 0 (Austin et al., 2021).This formulation ensures that the distribution q(G t |G 0 ) consistent with uniform distribution when time t becomes sufficiently large.
Next, we sample the node and edge types from these probability distributions to obtain a discrete graph: The reverse diffusion process.We aim to remove the noise from the graphs using a parameterized reverse process p θ .Following the formulation presented by Austin et al.(2021), we can express the posterior p θ (G t−1 |G t ) as: To predict the clean graph distribution G t p = p θ (G 0 |G t ) at time t given the noisy input G t , we train a graph transformer ϕ θ that outputs the clean graph representation: Our model ϕ θ adopt transformer structure (Vaswani et al., 2017).Previous graph transformer model (Ying et al., 2021) is not appropriate for encoding directed graphs based on time series, because the relative position information between nodes is lost during the noise adding process.For instance, the self-attention mechanism module in the transformer cannot differentiate two "transport" events that occur in different time periods.To address this issue, we encode the depth information of event nodes as a fixed-feature embedding n dep into the model.Before inputting the graph into the transformer, we add the depth feature to the corresponding node feature.
The depth fixed-feature embedding is encoded as follows: where w k = 1/10000 2k/n d and n d is the average depth of node n, i is the index of the depth embedding.
Inspired by Vignac et al.(2022), our transformer model comprises several layers, each of which consists of a self-attention module and a feed-forward network.For layer l, the self-attention module takes as input time features t, node features N t l , edge features E t l , and updates their representation as follows: where ⊙ denotes the pairwise multiplication.
To optimize our model, we use the cross-entropy loss L CE weighted by λ: Once we obtain the clean graph distribution G t p , we can infer the node distribution p θ (n t−1 |n t ) and edge distribution p θ (e t−1 |e t ) using the equations: where K n is the node type number, and K e is edge type number.Before the next reverse process, we will get the discrete graph G t−1 from its distribution by probability sampling.
Our model obtains the final event schema G through T-step reversing process in non-AR manner.For further algorithm and derivation details, please refer to Appendix C.
Conditional Generation.Previous approaches (Li et al., 2020(Li et al., , 2021;;Jin et al., 2022) need to train separate models for each scenario to ensure accurate generation.However, in order to generate event schemas for various scenarios using a single model and improve the model's generalization capabilities, we also propose the conditional diffusion model named as KDMall as a supplement.
We incorporate the category information y of the instance graphs as an additional attribute to control the training process of the model (Ho and Salimans, 2022;Dhariwal and Nichol, 2021).This allows us to influence the category of the generated schema.The formulation is as follows: Therefore, we only need to encode the category information y into the neural network.We simply concatenate it into temporal features, enabling the conditional diffusion model to generate event schemas of different categories:

Entity Relation Prediction
In this module, we have developed a simple architecture that combines a graph transformer for obtaining node representations with an MLP layer for relation prediction.This module takes the event skeleton, expanded with event argument roles, as input and generates the complete event schema by predicting the relations between entities.While previous models have primarily focused on entity types in the classification process, neglecting the significance of events and event roles, we address this limitation by artificially aggregating them together.We initialize the node features using BERT model (Devlin et al., 2018).Specifically, for each event or entity node n i , BERT (n i ) represents its type embedding encoded by BERT.For entity nodes, n e i indicates the event node that entity node n i belongs to, and n r i is a fixed embedding representing the role played by entity node n i in event n e i , The encoding formula of embeddings n r i is the same as that of equation 9.
Our transformer encoder is the same as the model used for Event Skeleton Generation, except that it lacks the time feature.The graph transformer outputs ni corresponding to the input n i , which is then passed to the MLP predictor.
The predicted relation type r ij of entity node n i and n j is then computed as: It is worth noting that the classification problem is highly unbalanced.To address this issue, we set different weights for different categories of the loss function, defined as: where rij denotes the true relation between entity i and j and H(•) is a scalar function that assigns balanced weights to different relationships, with each relationship corresponding to a specific value.

Datasets
We conduct experiments using the IED Schema Learning Corpus released by Li et al.(2021).The dataset utilizes the DARPA KAIROS ontology.The corpus specifically focuses on three sub-types of complex events related to Improvised Explosive Devices (IEDs): General-IED, Mass-Car-Bombing-IED, and Suicide-IED.However, the test data in the corpus has data quality issues since it is also extracted through IE tools.To address this, we manually modify the test data, generating golden test event schemas based on the modified data.Additionally, to ensure the objective evaluation of our model's effectiveness, we record the test results using the original, unmodified data, which are provided in Appendix 7.

Baselines
In this work, we compare the proposed event schema induction model with two baselines: Frequency-Based Sampling (FBS) model which constructs the event schema according to frequency distributions of temporal links in the training data.At each timestamp, FBS samples a pair of event types according to their frequency and adds the sampled edge into the schema graph.The procedure is repeated until FBS detects a cycle in the schema graph after adding a new edge.
Double Graph Auto-encoders Model (Double-GAE) (Jin et al., 2022), the state-of-the-art schema induction model which designs a variational directed acyclic graph auto-encoder to extract the event skeleton.Then it uses another GCN based auto-encoder to reconstruct entity-entity relations.
Large Language Model (LLM) have strong understanding and generation abilities, We ask the large language model (ChatGPT) to directly generate the event schema and use it as the baseline.

Evaluation Metrics
To evaluate the quality of the generated schema, we compare the schema with test instance graphs in terms of the following metrics to see how well the schema match real world instance graphs, the following evaluation metrics are employed: (1) Event type match.we calculate the F1 score between the event types present in the schema graph and test instance graphs.
(2) Event sequence match.A good schema is able to track events through a timeline.we calculate the F1 score between the event sequences of length 2 or 3 present in the schema graph and the test instance graphs.
(3) Node/edge type distribution.we compare the Kullback-Leibler (KL) divergence of the node and edge type distributions between the schema graph and each test instance graph.
(4) Event Argument Connection Match (CM).Complex event graph schema includes entities and their relations, representing how events are connected through arguments.Because there is a serious long tail issue with the data, we calculate the macro F1-score for every pair of relationships between entities.

Overall Results
As shown in Table 1, the result demonstrates the effectiveness of KDM in capturing important events and their relationships.Specifically, our approach outperforms the baseline methods in terms of event sequence matching, particularly for longer path lengths (l=3).
These improvements can be attributed to the discrete diffusion process employed in our model.This process allows our model to simultaneously predict the categories of all nodes and edges, making it well-suited for graph generation.Addition- ally, the Transformer architecture leveraged in our model effectively utilizes global features through the self-attention mechanism, resulting in improved prediction accuracy.Furthermore, our model shows remarkable improvements in the Connection Match evaluation, indicating the effectiveness of our graph transformer model than GCN graph auto-encoder in Double-GAE.

Conditional Generation Results
Building upon the aforementioned diffusion model, in order to improve the possibility of the model's generalization ability, we present an extension in the form of a conditional diffusion model as a supplement.This model enables the generation of event schemas for various scenarios using a single model.
As shown in Table 2, when comparing KD-Mall with our model trained on a specific-dataset, we find that KDMall shows improved generalization capabilities and better understanding of event relationships, particularly in the "General-IED" scenario.Additionally, in other datasets, KD-Mall demonstrates comparable results to the model trained on a single dataset, indicating the potential of our conditional generation process.The incorporation of diverse training data enables the model to learn common patterns and associations across different scenarios, leading to improved performance and broader applicability (Sastry et al., 2023;Kim et al., 2022).

Ablation Experiment
To demonstrate the effectiveness of our approach, we conduct ablation studies on the "Suicide-IED" dataset.(1) IGE Module Ablation Experiment: To prove the effectiveness of our approach, we conduct experiments as shown in Table 3. JSON is a prevalent format for representing structured data.We encode the data in JSON format and instruct the LLMs to perform expansion, while maintaining the rest of the process consistent with the Python prompt approach.As shown in the Table, the results obtained through the use of Python prompts are noticeably better than those achieved with JSON prompts.And after filtering, the event types generated by the Python prompt are significantly more numerous than those generated by the JSON prompt.This observation underscores the effectiveness of the Python prompt approach.(2) Diffusion Model Ablation Experiment: In Table 4, comparing our KDM model with a variant that removes the Instance Graph Expansion module, Our model achieves a 4.1% increase in node matching accuracy, proving the effectiveness of Instance Graph Expansion module.Additionally, by incorporating depth information, we observe a notable 5.9% improvement in sequence matching.These results demonstrate that the inclusion of depth information enhances our model's ability to capture the structural characteristics of graph, proving the effectiveness of adding depth features.
(3) Entity Predictor Ablation Experiment: In Figure 4, Compared to not setting weight hyperparameters, our model achieves a significant 5.57% improvement in the macro F1 index, demonstrating that our weight scalar function significantly addresses the long-tail data problem.Moreover, when comparing the results of "w/o RE" and "w/o IGE", the improvements highlight the effectiveness of adding role and event features and Instance Graph Expansion module.

Case Study
In Figure 5, we observe that our Instance Graph Expansion module successfully generates a schema that encompasses a broader range of events and exhibits more comprehensive temporal relationships within complex events.This outcome support the effectiveness of leveraging object-oriented coding to distill knowledge from LLMs.Additionally, we provide a case study showcasing the diffusion process on the "Suicide-IED" dataset in Figure 6 in Appendix.Event Schema Event schema induction is a comprehensive graphical pattern composed of temporal and multi hop argument relationships (Li et al., 2020;Jin et al., 2022).It is actually a combination of atomic schema induction (Chambers, 2013;Yuan et al., 2018;Du and Ji, 2022;Wang et al., 2021) and script learning (Rudinger et al., 2015;Granroth-Wilding and Clark, 2016;Weber et al., 2018).Clearly, the event schema induction has broad application significance.For example, the event schema facilitates analysis and prediction of future events, aiding in the development of reaction plans for relevant scenarios (Li et al., 2021;Dror et al., 2023;Pan et al., 2021).Event schemas can be used as guidance information in information extraction, which helps people understand the internal logic of events (Wen et al., 2021).
Diffusion models Diffusion model (Sohl-Dickstein et al., 2015;Ho et al., 2020) has achieved impressive results on image, text and audio generation (Rombach et al., 2022;Shen et al., 2023;Li et al., 2022;Gong et al., 2022;Kong et al., 2020;Yuan et al., 2022).Recently, Vignac et al. (2022) have shown great potential in graph generation field.Previous graph diffusion models embedded graphs in a continuous space by adding Gaussian noise to the nodes and edges feature (Niu et al., 2020;Jo et al., 2022).However, this approach destroys the graph's sparsity and makes it hard to capture the node connections (Vignac et al., 2022).Discrete diffusion model (Austin et al., 2021;Yang et al., 2023;Vignac et al., 2022;Johnson et al., 2021) overcomes this problem by utilizing Markov process that can occur independently on each node or edge.

Conclusion
In this work, we identify the limitations of previous works and proposed a Knowledge-enriched discrete diffusion model.To enhance the quality and coherence of the generated schemas, we harness the potential and rich knowledge present in LLMs by utilizing them for Instance Graph Expansion.
Our model leverages a discrete diffusion process to learn and generate event skeletons, while incorporating an entity relationship predictor to predict the relationships between event arguments.Additionally, we propose a conditional diffusion model with the purpose of generating schemas for mul-tiple diverse topics.We achieved the best results among multiple different evaluation indicators.

Limitations
We only consider the temporal relationship between events here and do not consider the hierarchical structure of the event schema, which may result in not perfect event schemas generated by us.Due to the limited availability of datasets, our conditional diffusion model KDMall has only undergone unified training and testing on three highly related explosive events, requiring more categories and quantities of data for the comprehensive ability testing of the model.

Ethics Statement
We use a discrete diffusion model to generate event skeletons and design an entity relationship predictor.At the same time, we have fully explored the potential rich knowledge in LLM for knowledge expansion.Our work has improved the effectiveness of event schema induction, helping people better summarize the logic and ontology knowledge of events, making contributions to this field.

A Data Preprocessing
In the data preprocessing stage.Firstly, for each complex event, we constructed an instance graph by merging coreferential events or entities.Isolated events were excluded from the instance graphs during the graph construction process.Specifically, we followed the cleaning strategy outlined in Jin et al. (2022).We deleted links with the same start and end types, as well as event-event links such as (DIE, INJURE), (ARRESTJAILDETAIN, ATTACK), (ENDPO-SITION, STARTPOSITION), (DEFEAT, EX-CHANGEBUYSELL), (SENTENCE, DIE), (END-POSITION, SENTENCE), and (THREATENCO-ERCE, RELEASEPAROLE) from the instance graphs.The maximum number of graph nodes m is set to 50.

B Training And Evaluation Details
In our event skeleton induction process, we utilize a 12-layer Transformer model.Additionally, we employ a 3-layer Transformer as our entity relation predictor.To balance the trade-off between nodes and edges, we set λ to 3. The learning rate is set to 1e-4, and the number of diffusion training epochs is set to 2500.The scalar function H(r ij ) is set to 0.1 if r ij indicates "No-Relation", otherwise the function is set to 0.9.We conduct evaluations using 500 randomly generated event schemas for each performance metric.The node number is sampled from a range of 25 to 35.We choose the model checkpoint from the last epoch for evaluation.
In the Instance Graph Expansion process, we select the top 10 frequently occurring event sequences from the training data as inputs for Chat-GPT.Each event sequence is input to ChatGPT 10 times to obtain the final result.Furthermore, we use a hyperparameter K of 3 to filter out events generated by ChatGPT that occur less frequently.
In the Entity Relation Prediction module, each event has a predetermined set of argument roles.For example, the "Injure" event may have the argument role "Victim" limited to entity types "PER" and "AML".We count the occurrences of entity categories for each role in all instance graphs.The entity category with the highest occurrence in the corresponding role is then inserted into the event skeleton.
To modify the test data, we made the following modifications: 1. Merge the same path: For all subsequent nodes of each event node, merge event nodes with the same type, starting from the START node and merging in the order of the BFS algorithm.2. Supplementary event nodes: Based on human judgment, randomly add possible missing events that may occur in the schema.

C Conditional Discrete Diffusion Model
Transition probability matrix in forward process.In discrete diffusion model, a transition probability matrix Q is defined to corrupt data for each step.Here Q t = α t I + (1 − α t )1(1) T /K, where 1 is a column vector of all ones, α t varies from 1 to 0 making sure the node and edge feature sampled from is a uniform distribution at time T (Hoogeboom et al., 2021;Yang et al., 2023).β t = (1 − α t )/K and the transition matrix can be represent as: we can calculate q(x t |x 0 ) according to following formula: as t is enough large, α t is close to 0, the graph distribution G t is confirm to uniform distribution.
Reverse discrete diffusion process, we convert the noise G T into G, whose joint probability having a Markovian structure follows (Vignac et al., 2022): where p θ is the process of the reverse with learnable parameters θ. and for each discrete elements x in graph G 0:T posterior probability is : p θ (x t−1 |x t ) = x q(x t−1 |x t , x)p(x) = x 0 q(x t−1 |x t , x 0 )p θ (x 0 |x t ) The posterior q(x t−1 |x t , x 0 ) can be derived according to Bayes formula as follows (Austin et al., 2021): q(x t−1 |x t , x 0 ) = q(x t |x t−1 , x 0 )q(x t−1 |x 0 ) q(x t |x 0 ) = x t (Q t ) T ⊙ x 0 Qt−1 To train the discrete diffusion process, we minimize the negative logarithmic likelihood of the predicted distribution of the model using variational lower bound (VLB), We use G 0 here to represent G: ≤ LV LB = −E q(G 0:G ) [log p θ (G0)] = E q(G 0:T ) [DKL(q(GT |G0) ∥ p θ (GT )) DKL(q(Gt−1|Gt, G0) ∥ p θ (Gt−1|Gt)) DKL(q(Gt−1|Gt, G0) ∥ p θ (Gt−1|Gt)) Please note that G t is sampled from the node number distribution G n and the corresponding depth distribution G d .Therefore, the probability p θ (G T ) can be expressed as p θ (G T ) = p θ (G T |G n , G d )p θ (G n , G d ).
The terms L T and L t−1 represent the Kullback-Leibler (KL) divergences between graph categorical distributions, while L 0 represent the predicted probabilities of the graph G 0 based on the noisy graph G 1 .The algorithms 1 and 2 is the training and generating algorithms about KDM.

D Supplement Experiment
As presented in Table 5, we evaluate our model on the original testing data used by (Jin et al., 2022) and observe consistent outperformance of our model compared to the baselines specially in sequence match.This result highlights the strong capability of our discrete diffusion model in generating high-quality event schemas.Interestingly, we

Figure 1 :
Figure 1: An example of schema induction process for complex event "Bombing".

Figure 3 :
Figure 3: The model structure.The data passes through Instance Graph Expansion module, Event Skeleton Generation module, and Entity Relation Prediction module in sequence to obtain the final schema.
Figure 4: The entity predictor ablation experiment."w/o RE" denoted as trained without fixed role embedding features and event embedding features; "w/o weight" denoted as trained without hyperparameter weight; "w/o IGE" denoted as trained without Instance Graph Expansion module.

Table 1 :
Schema matching score (%) is calculated by checking the intersection of the induced schemas and the manually checked test schemas.

Table 3 :
Results of different prompts for the IGE Module on Suicide-IED dataset.EN is the number of effective events generated by the LLM after filtering, which are used for Instance Graph Expansion.

Table 4 :
The diffusion model ablation experiment on Suicide-IED dataset."w/o depth" denoted as trained the model without depth features in graph transformer; "w/o IGE" denoted as trained the model on the dataset without Instance Graph Expansion module.