Learning Neural Ordinary Equations for Forecasting Future Links on Temporal Knowledge Graphs

There has been an increasing interest in inferring future links on temporal knowledge graphs (KG). While links on temporal KGs vary continuously over time, the existing approaches model the temporal KGs in discrete state spaces. To this end, we propose a novel continuum model by extending the idea of neural ordinary differential equations (ODEs) to multi-relational graph convolutional networks. The proposed model preserves the continuous nature of dynamic multi-relational graph data and encodes both temporal and structural information into continuous-time dynamic embeddings. In addition, a novel graph transition layer is applied to capture the transitions on the dynamic graph, i.e., edge formation and dissolution. We perform extensive experiments on five benchmark datasets for temporal KG reasoning, showing our model’s superior performance on the future link forecasting task.


Introduction
Reasoning on relational data has long been considered an essential subject in artificial intelligence with wide applications, including decision support and question answering. Recently, reasoning on knowledge graphs has gained increasing interest (Ren and Leskovec, 2020; Das et al., 2018). A Knowledge Graph (KG) is a graph-structured knowledge base to store factual information. KGs represent facts in the form of triples (s, r, o), e.g., (Bob, livesIn, New York), in which s (subject) and o (object) denote nodes (entities), and r denotes the edge type (relation) between s and o. Knowledge graphs are commonly static and store facts in their current state. In reality, however, the relations between entities often change over time. For example, if Bob moves to California, the triple of (Bob, livesIn, New York) will be invalid. To this end, temporal knowledge graphs (tKG) were introduced. * Equal contribution. † Corresponding author.
A tKG represents a temporal fact as a quadruple (s, r, o, t) by extending a static triple with time t, describing that this fact is valid at time t. In recent years, several sizable temporal knowledge graphs, such as ICEWS (Boschee et al., 2015), have been developed that provide widespread availability of such data and enable reasoning on temporal KGs. While lots of work (García-Durán et al., 2018;Goel et al., 2020;Lacroix et al., 2020) focus on the temporal KG completion task and predict missing links at observed timestamps, recent work (Jin et al., 2019;Trivedi et al., 2017) paid attention to forecast future links of temporal KGs. In this work, we focus on the temporal KG forecasting task, which is more challenging than the completion task. Most existing work (Jin et al., 2019;Zhu et al., 2020) models temporal KGs in a discrete-time domain where they take snapshots of temporal KGs sampled at regularly-spaced timestamps. Thus, these approaches cannot model irregular time intervals, which convey essential information for analyzing dynamics on temporal KGs, e.g., the dwelling time of a user on a website becomes shorter, indicating that the user's interest in the website decreases. KnowEvolve (Trivedi et al., 2017) uses a neural point process to model continuous-time temporal KGs. However, Know-Evolve does not take the graph's structural information into account, thus losing the power of modeling temporal topological information. Also, KnowEolve is a transductive method that cannot handle unseen nodes. In this paper, we present a graph neural-based approach to learn dynamic representations of entities and relations on temporal KGs. Specifically, we propose a graph neural ordinary differential equation to model the graph dynamics in the continuoustime domain.
Inspired by neural ordinary differential equations (NODEs) (Chen et al., 2018), we extend the idea of continuum-depth models to encode the continuous dynamics of temporal KGs. To apply NODEs to temporal KG reasoning, we employ a NODE coupled with multi-relational graph convolutional (MGCN) layers. MGCN layers are used to capture the structural information of multi-relational graph data, while the NODE learns the evolution of temporal KGs over time. Specifically, we integrate the hidden representations over time using an ODE solver and output the continuous-time dynamic representations of entities and relations. Unlike many existing temporal KG models that learn the dynamics by employing recurrent model structures with discrete depth, our model lets the time domain coincide with the depth of a neural network and takes advantage of NODE to steer the latent entity features between two timestamps smoothly. Besides, existing work simply uses the adjacency tensor from previous snapshots of the tKG to predict its linkage structure at a future time. Usually, most edges do not change between two observations, while only a few new edges have formatted or dissolved since the last observation. However, the dissolution and formation of these small amounts of edges always contain valuable temporal information and are more critical than unchanged edges for learning the graph dynamics. For example, we know an edge with the label eco-nomicallyCooperateWith between two countries x and y at time t, but this dissolves at t + ∆t 1 . Additionally, there is another edge with the label banTradesWith between these two countries that are formated at t + ∆t 2 (∆t 2 > ∆t 1 ). Intuitively, the dissolution of (x, economicallyCooperateWith, y) is an essential indicator of the quadruple (x, banTradesWith, y, t + ∆t 2 ). Thus, it should get more attention from the model. However, suppose we only feed the adjacency tensors of different observation snapshots into the model. In that case, we do not know whether the model can effectively capture the changes of the adjacency tensors and puts more attention on the evolving part of the graph. To let the model focus on the graph's transitions, we propose a graph transition layer that takes a graph transition tensor containing edge formation and dissolution information as input and uses graph convolutions to process the transition information explicitly.
In this work, we propose a model to perform Temporal Knowledge Graph Forecasting with Neural Ordinary Equations (TANGO ). The main contributions are summarized as follows: • We propose a continuous-depth multi-relational graph neural network for forecasting future links on temporal KGs by defining a multi-relational graph neural ordinary differential equation. The ODE enables our model to learn continuous-time representations of entities and relations. We are the first to show that the neural ODE framework can be extended to modeling dynamic multi-relational graphs.
• We propose a graph transition layer to model the edge formation and dissolution of temporal KGs, which effectively improves our model's performance.
• We propose two new tasks, i.e., inductive link prediction and long horizontal link forecasting, for temporal KG models. They evaluate a model's potential by testing the model's performance on previously unseen entities and predicting the links happening in the farther future.
• We apply our model to forecast future links on five benchmark temporal knowledge graph datasets, showing its state-of-the-art performance.

Graph Convolutional Networks
Graph convolutional networks (GCNs) have shown great success in capturing structural dependencies of graph data. GCNs come in two classes: i) spectral methods (Kipf and Welling, 2016;Defferrard et al., 2016) and ii) spatial methods (Niepert et al., 2016;Gilmer et al., 2017 where z(t) denotes the hidden state of a dynamic system at time t, and f denotes a function parameterized by a neural network to describe the derivative of the hidden state regarding time. θ represents the parameters in the neural network. The output of a NODE framework is calculated using an ODE solver coupled with an initial value: Here, t 0 is the initial time point, and t 1 is the output time point. z(t 1 ) and z(t 0 ) represent the hidden state at t 1 and t 0 , respectively. Thus, the NODE can output the hidden state of a dynamic system at any time point and deal with continuous-time data, which is extremely useful in modeling continuoustime dynamic systems. Moreover, to reduce the memory cost in the backpropagation, Chen et al. (2018) introduced the adjoint sensitivity method into NODEs. An adjoint is a(t) = ∂L ∂z(t) , where L means the loss. The gradient of L with regard to network parameters θ can be directly computed by the adjoint and an ODE solver: In other words, the adjoint sensitivity method solves an augmented ODE backward in time and computes the gradients without backpropagating through the operations of the solver.

Temporal Knowledge Graph Reasoning
Let V and R represent a finite set of entities and relations, respectively. A temporal knowledge graph (tKG) G is a multi-relational graph whose edges evolve over time. At any time point, a snapshot G(t) contains all valid edges at t. Note that the time interval between neighboring snapshots may not be regularly spaced. A quadruple q = (s, r, o, t) describes a labeled timestamped edge at time t, where r ∈ R represents the relation between a subject entity s ∈ V and an object entity o ∈ V.
Formally, we define the tKG forecasting task as follows. Let (s q , r q , o q , t q ) denote a target quadruple and F represent the set of all ground-truth quadruples. Given query (s q , r q , ?, t q ) derived from the target quadruple and a set of observed events O = {(s, r, o, t i ) ∈ F|t i < t q }, the tKG forecasting task predicts the missing object entity o q based on observed past events. Specifically, we consider all entities in set V as candidates and rank them by their scores to form a true quadruple together with the given subject-relation-pair (s q , r q ) at time t q . In this work, we add reciprocal relations for every quadruple, i.e., adding (o, r −1 , s, t) for every (s, r, o, t). Hence, the restriction to predict object entities does not lead to a loss of generality.
Extensive studies have been done for temporal KG completion task (Leblay and Chekol, 2018;García-Durán et al., 2018;Goel et al., 2020;Han et al., 2020a). Besides, a line of work (Trivedi et al., 2017;Jin et al., 2019;Deng et al., 2020;Zhu et al., 2020) has been proposed for the tKG forecasting task and can generalize to unseen timestamps. Specifically, Trivedi et al. (2017) and Han et al. (2020b) take advantage of temporal point processes to model the temporal KG as event sequences and learn evolving entity representations.

Our Model
Our model is designed to model time-evolving multi-relational graph data by learning continuoustime representations of entities. It consists of a neural ODE-based encoder and a decoder based on classic KG score functions. As shown in Figure 1b, the input of the network will be fed into two parallel modules before entering the ODE Solver. The upper module denotes a multi-relational graph convolutional layer that captures the graph's structural information according to an observation at time t. And the lower module denotes a graph transition layer that explicitly takes the edge transition tensor of the current observation representing which edges have been added and removed since the last observation. The graph transition layer focuses on modeling the graph transition between neighboring observations for improving the prediction of link formation and dissolution. For the decoder, we compare two score functions, i.e., DistMult (Yang et al., 2014) and TuckER (Balazevic et al., 2019). In principle, the decoder can be any score function.  In addition to f MGCN , a graph transition layer f trans is employed to model the edge formation and dissolution.

Neural ODE for Temporal KG
The temporal dynamics of a time-evolving multirelational graph can be characterized by the following neural ordinary differential equation where H ∈ R (|V|+2|R|)×d denotes the hidden representations of entities and relations. f TANGO represents the neural network that parameterizes the derivatives of the hidden representations. Besides, f MGCN denotes stacked multi-relational graph convolutional layers, f trans represents the graph transition layer, and G(t) denotes the snapshot of the temporal KG at time t. T(t) contains the information on edge formation and dissolution since the last observation. w is a hyperparameter controlling how much the model learns from edge formation and dissolution. We set H(t = 0) = Emb(V, R), where Emb(V, R) denotes the learnable initial embeddings of entities and relations on the temporal KG. Thus, given a time window ∆t, the representation evolution performed by the neural ODE assumes the following form In this way, we use the neural ODE to learn the dynamics of continuous-time temporal KGs.

Multi-Relational Graph Convolutional Layer
where h l+1 o (t) denotes the hidden representation of the object o at the (l + 1) th layer, W l represents the weight matrix on the l th layer, * denotes element-wise multiplication. h l s (t) means the hidden representation of the subject s at the l th layer. h l=0 s (t) = h s (t) is obtained by the ODE Solver that integrates Equation 4 until t. δ is a learnable weight. In this work, we assume that the relation representations do not evolve, and thus, h r is timeinvariant. We use ReLU (·) as the activation function σ(·). From the view of the whole tKG, we use H(t) to represent the hidden representations of all entities and relations on the tKG. Besides, we use f MGCN to denote the network consisting of multiple multi-relational graph convolutional layers (Equation 6).

Graph Transition Layer
To let the model focus on the graph's transitions, we define a transition tensor for tKGs and use graph convolutions to capture the information of edge formation and dissolution. Given two graph snapshots G(t − ∆t) and G(t) at time t − ∆t and t, respectively, the graph transition tensor T(t) is defined as where A(t) ∈ {0, 1} |V|×|R|×|V| is a three-way adjacency tensor whose entries are set such that 0, otherwise.
(8) Intuitively, T(t) ∈ {−1, 0, 1} |V|×|R|×|V| contains the information of the edges' formation and dissolution since the last observation G(t − ∆t). Specifically, T sro (t) = −1 means that the triple (s, r, o) disappears at t, and T sro (t) = 1 means that the triplet (s, r, o) is formatted at t. For all unchanged edges, their values in T(t) are equal to 0. Additionally, we use graph convolutions to extract the information provided by the graph transition tensor: By employing this graph transition layer, we can better model the dynamics of temporal KGs. We use f trans to denote Equation 9. By combining the multi-relational graph convolutional layers f MGCN with the graph transition layer f trans , we get our final network that parameterizes the derivatives of the hidden representations H(t), as shown in Figure 1b.

Learning and Inference
TANGO is an autoregressive model that forecasts the entity representation at time t by utilizing the graph information before t. To answer a link forecasting query (s, r, ?, t), TANGO takes three steps. First, TANGO computes the hidden representations H(t) of entities and relations at the time t. Then TANGO uses a score function to compute the scores of all quadruples {(s, r, o, t)|o ∈ V} accompanied with candidate entities. Finally, TANGO chooses the object with the highest score as its prediction.
Representation inference The representation inference procedure is done by an ODE Solver, which is H(t) = ODESolver(H(t − ∆t), f TANGO , t − ∆t, t, Θ TANGO , G). Adaptive ODE solvers may incur massive time consumption in our work. To keep the training time tractable, we use fixed-grid ODE solvers coupled with the Interpolated Reverse Dynamic Method (IRDM) proposed by Daulbaev et al. Table 1: Score Functions. h s , h r , h o denote the entity representations of the subject entity s, object entity o, and the representation of the relation r, respectively. d denotes the hidden dimension of representations. W ∈ R d×d×d is the core tensor specified in (Balazevic et al., 2019). As defined in (Tucker, 1964), × 1 , × 2 , × 3 are three operators indicating the tensor product in three different modes. 2020). IRDM uses Barycentric Lagrange interpolation (Berrut and Trefethen, 2004) on Chebyshev grid (Tyrtyshnikov, 2012) to approximate the solution of the hidden states in the reverse-mode of NODE. Thus, IRDM can lower the time cost in the backpropagation and maintain good learning accuracy. Additional information about representation inference is provided in Appendix A.
Score function Given the entity and relation representations at the query time t q , one can compute the scores of every triple at t q . In our work, we take two popular knowledge graph embedding models, i.e., Distmult (Yang et al., 2014) and TuckER (Balazevic et al., 2019). Given triple (s, r, o), its score is computed as shown in Table 1.
Parameter Learning For parameter learning, we employ the cross-entropy loss: where f (o|s, r, t, V) = exp(score(hs(t),hr,ho(t))) e∈V exp(score(hs(t),hr,he(t))) . e ∈ V represents an object candidate, and score(·) is the score function. F summarizes valid quadruples of the given tKG.

Experimental Setup
We evaluate our model by performing future link prediction on five tKG datasets 1 . We compare TANGO's performance with several existing methods and evaluate its potential with inductive link prediction and long horizontal link forecasting. Besides, an ablation study is conducted to show the effectiveness of our graph transition layer.

Evaluation Metrics
We use two metrics to evaluate the model performance on extrapolated link prediction, namely Mean Reciprocal Rank (MRR) and Hits@1/3/10. MRR is the mean of the reciprocal values of the actual missing entities' ranks averaged by all the queries, while Hits@1/3/10 denotes the proportion of the actual missing entities ranked within the top 1/3/10. The filtering settings have been implemented differently by various authors. We report results based on two common implementations: i) time-aware (Han et al., 2021) and ii) time-unaware filtering (Jin et al., 2019). We provide a detailed evaluation protocol in Appendix B.

Baseline Methods
We compare our model performance with nine baselines. We take three static KG models as the static  (Goel et al., 2020), TNTComplEx (Lacroix et al., 2020), and RE-Net (Jin et al., 2019). We provide implementation details of baselines and TANGO in Appendix C.

Time-aware filtered Results
We run TANGO five times and report the averaged results. The time-aware filtered results are pre-sented in Table 2, where denotes TANGO. As explained in Appendix B, we take the time-aware filtered setting as the fairest evaluation setting. Results demonstrate that TANGO outperforms all the static baselines on every dataset. This implies the importance of utilizing temporal information in tKG datasets. The comparison between Distmult and TANGO-Distmult shows the superiority of our NODE-based encoder, which can also be observed by the comparison between TuckER and TANGO-TuckER. Additionally, TANGO achieves much better results than COMPGCN, indicating our method's strength in incorporating temporal features into tKG representation learning. Figure 2: Time-aware filtered MRR of TANGO with or without the graph transition layer on subsets of ICEWS05-15 and WIKI. We split the graph snapshots into two groups, where the transition tensor's norm ||T(t)|| L1 of each graph snapshot in the first group is larger than that of all graph snapshots in the second group. Since the graph transition layer is tailored to graph changes, we show the results of the first group here. The corresponding result of the ablation study on the whole test sets are presented in Figure 8 in the appendix.
Similarly, TANGO outperforms all the tKG baselines as well. Unlike TTransE and TA-Distmult, RE-Net uses a recurrent neural encoder to capture temporal information, which shows great success on model performance and is the strongest baseline. Our model TANGO implements a NODE-based encoder in the recurrent style to capture temporal dependencies. It consistently outperforms RE-Net on all datasets because TANGO explicitly encodes time information into hidden representations while RE-Net only considers the temporal order between events. Additionally, we provide the raw and timeunaware filtered results in Table 5

Ablation Study
To evaluate the effectiveness of our graph transition layer, we conduct an ablation study on two datasets, i.e., ICEWS05-15 and WIKI. We choose these two datasets as the representative of two types of tKG datasets. ICEWS05-15 contains events that last shortly and happen multiple times, i.e., Obama visited Japan. In contrast, the events in the WIKI datasets last much longer and do not occur periodically, i.e., Eliran Danin played for Beitar Jerusalem FC between 2003 and 2010. The improvement of the time-aware filtered MRR brought by the graph transition layer is illustrated in Figure 2, showing that the graph transition layer can effectively boost the model performance by incorporating the edge formation and dissolution information.

Time Cost Analysis
Keeping training time short while achieving a strong performance is significant in model evaluation. We report in Figure 3 the total training time of our model and the baselines on ICEWS05-15. We see that static KG reasoning methods generally require less training time than temporal methods. Though the total training time for TTransE is short, its performance is low, as reported in the former sections. TA-Distmult consumes more time than our model and is also beaten by TANGO in performance. RE-Net is the strongest baseline in performance; however, it requires almost ten times as much as the total training time of TANGO. TANGO ensures a short training time while maintaining the state-of-the-art performance for future link prediction, which shows its superiority.

Long Horizontal Link Forecasting
Given a sequence of observed graph snapshots until time t, the future link prediction task infers the quadruples happening at t + ∆t. ∆t is usually small, i.e., one day, in standard settings (Trivedi et al., 2017;Jin et al., 2019;Zhu et al., 2020). However, in some scenarios, the graph information right before the query time is likely missing. This arouses the interest in evaluating the temporal KG models by predicting the links in the farther future. In other words, given the same input, the model should predict the links happening at t+∆T , where ∆T >> ∆t. Based on this idea, we define a new evaluation task, e.g., long horizontal link forecasting.   To perform long horizontal link forecasting, we adjust the integral length according to how far the future we want to predict. As described in Figure  5, the integration length between the neighboring timestamps is short for the first k steps, e.g., integration from (t − t k ) to (t − t k + ∆t). However, for the last step, e.g., integration from t to t + ∆T , the integration length becomes significantly large according to how far the future we want to predict. The larger ∆T is, the longer the length is for the last integration step.  t k ), ..., G(t)}, whose length is k, test quadruples at t + ∆T are to be predicted.
We report the results corresponding to different ∆T on ICEWS05-15 and compare our model with the strongest baseline RE-Net. In Figure 4, we observe that our model outperforms RE-Net in long horizontal link forecasting. The gap between the performances of the two models diminishes as ∆T increases. This trend can be explained in the following way. Our model employs an ODE solver to integrate the graph's hidden states over time. Since TANGO takes the time information into account and integrates the ODE in the continuous-time domain, its performance is better than RE-Net, which is a discrete-time model. However, TANGO assumes that the dynamics it learned at t also holds at t + ∆T . This assumption holds when ∆T is small. As ∆T increases, the underlying dynamics at t+∆T would be different from the dynamics at t. Thus, the TANGO's performance degrades accordingly, and the advancement compared to RE-Net also vanishes.

Inductive Link Prediction
New graph nodes might emerge as time evolves in many real-world applications, i.e., new users and items. Thus, a good model requires a strong generalization power to deal with unseen nodes. We propose a new task, e.g., inductive link prediction, to validate the model potential in predicting the links regarding unseen entities at a future time. A test quadruple is selected for the inductive prediction if either its subject or object or both haven't been observed in the training set. For example, in the test set of ICEWS05-15, we have a quadruple (Raheel Sharif, express intent to meet or negotiate, Chaudhry Nisar Ali Khan, 2014-12-29). The entity Raheel Sharif does not appear in the training set, indicating that the aforementioned quadruple contains an entity that the model does not observe in the training set. We call the evaluation of this kind of test quadruples the inductive link prediction analysis.
We perform the future link prediction on these inductive link prediction quadruples, and the results are shown in Table 3. We compare our model with the strongest baseline RE-Net on ICEWS05-15. We also report the results achieved by TANGO without the graph transition layer to show the performance boost brought by it. As shown in Table 3, TANGO-TuckER achieves the best results across all metrics. Both TANGO-TuckER and TANGO-Distmult can beat RE-Net, showing the strength of our model in inductive link prediction. The results achieved by the TANGO models are much better than their variants without the graph transition layers, which proves that the proposed graph transition layer plays an essential role in inductive link prediction.

Conclusions
We propose a novel representation method, TANGO , for forecasting future links on tem-poral knowledge graphs (tKGs). We propose a multi-relational graph convolutional layer to capture structural dependencies on tKGs and learn continuous dynamic representations using graph neural ordinary differential equations. Especially, our model is the first one to show that the neural ODE can be extended to modeling dynamic multirelational graphs. Besides, we couple our model with the graph transition layer to explicitly capture the information provided by the edge formation and deletion. According to the experimental results, TANGO achieves state-of-the-art performance on five benchmark datasets for tKGs. We also propose two new tasks to evaluate the potential of link forecasting models, namely inductive link prediction and long horizontal link forecasting. TANGO performs well in both tasks and shows its great potential. Figure 6: Illustration of the inference procedure. The shaded purple area represents the whole architecture of TANGO. It is a Neural ODE equipped with a GNNbased module f TANGO . Dashed arrows denote the input and the output path of the graph's hidden state. Red solid arrows indicate the continuous hidden state flows learned by TANGO. Black solid lines represent that TANGO calls the function set_graph and set_trans. The corresponding graph snapshots G and transition tensors T are input into f TANGO for learning temporal dynamics.

B Evaluation Metrics
We report the results in three settings, namely raw, time-unaware filtered, and time-aware filtered. For time-unaware filtered results, we follow the filtered evaluation constraint applied in (Bordes et al., 2013;Jin et al., 2019), where we remove from the list of corrupted triplets all the triplets that appear either in the training, validation, or test set ex-    model. Additionally, we use the implementation of TTransE and TA-Distmult provided in (Jin et al., 2019). For TA-Distmult, the vocabulary of temporal tokens consists of year, month, and day for all the datasets. We use the released code to implement DE-SimplE 5 , TNTComplEx 6 , and CyGNet 7 . All the baselines are trained with Adam Optimizer (Kingma and Ba, 2017), and the batch size is set to 512.   Table 9 We follow the data preprocessing method and the dataset split strategy proposed in (Jin et al., 2019). Specifically, we split each dataset except ICEWS14 in chronological order into three parts, e.g., 80%/10%/10% (training/validation/test). For ICEWS14, we split it into the training set and testing set with 50%/50% since ICEWS14 is not pro-5 https://github.com/BorealisAI/de-simple 6 https://github.com/facebookresearch/tkbc 7 https://github.com/CunchaoZ/CyGNet vided with a validation set. As explained in (Jin et al., 2019), the difference between the first type (ICEWS) and the second type (WIKI and YAGO) of tKG datasets is that the first type datasets are events that often last shortly and happen multiple times, i.e., Obama visited Japan four times. In contrast, the events in the second type datasets last much longer and do not occur periodically, i.e., Eli-

E Impact of Past History Length
As mentioned in A, TANGO utilizes the previous histories between (t − t k ) and t to forecast a link at t, where t k is a hyperparameter. Figure 7 shows the performance with various lengths of past histories along with the corresponding training time. When TANGO uses longer histories, MRR is getting higher. However, a long history requires more forwarding inferences. The choice of history length is a trade-off between the performance and computational cost. We observe that the gain of MRR compared to the training time is not significant when the length of history is four and over. Thus, the history length of four is chosen in our experiments.

F Analysis on Temporal KGs with Irregular Time Intervals
Most existing tKG reasoning models cannot properly deal with temporal KGs with irregular time intervals, while TANGO model them much better due to the nature of Neural ODE. We verify this via experiments on a new dataset. We call it  ICEWS05-15_continuous. We sample the timestamps in ICEWS05-15 and keep the time intervals between each two of them in a range from 1 to 4. We only keep the temporal KG snapshots at the sampled time and extract a new subset. ICEWS05-15_continuous fits the setting when observations are taken non-periodically in continuous time. The dataset statistics of ICEWS05-15_continuous is reported in Table 11. We train our model and baseline methods on it and evaluate them with time-aware filtered MRR. As shown in Table 10, we validate that TANGO performs well on temporal KGs with irregular time intervals.