MetaTKG: Learning Evolutionary Meta-Knowledge for Temporal Knowledge Graph Reasoning

Reasoning over Temporal Knowledge Graphs (TKGs) aims to predict future facts based on given history. One of the key challenges for prediction is to learn the evolution of facts. Most existing works focus on exploring evolutionary information in history to obtain effective temporal embeddings for entities and relations, but they ignore the variation in evolution patterns of facts, which makes them struggle to adapt to future data with different evolution patterns. Moreover, new entities continue to emerge along with the evolution of facts over time. Since existing models highly rely on historical information to learn embeddings for entities, they perform poorly on such entities with little historical information. To tackle these issues, we propose a novel Temporal Meta-learning framework for TKG reasoning, MetaTKG for brevity. Specifically, our method regards TKG prediction as many temporal meta-tasks, and utilizes the designed Temporal Meta-learner to learn evolutionary meta-knowledge from these meta-tasks. The proposed method aims to guide the backbones to learn to adapt quickly to future data and deal with entities with little historical information by the learned meta-knowledge. Specially, in temporal meta-learner, we design a Gating Integration module to adaptively establish temporal correlations between meta-tasks. Extensive experiments on four widely-used datasets and three backbones demonstrate that our method can greatly improve the performance.


Introduction
Temporal Knowledge Graphs (TKGs) (Boschee et al., 2015) are of great practical values as an effective way to represent the real-world timeevolving facts.In TKGs, the expression of a fact process.This brings up the issue that most models are hard to adapt to future data with different evolution patterns.For example, models trained with facts before COVID-19 are hard to adapt to facts that happen under the circumstance of COVID-19, because the evolution pattern of facts changes with the outbreak of COVID-19.Thus, it is important to learn variations in evolution patterns so as to guide models quickly adapt to future data with diverse evolution patterns.However, existing TKG reasoning models ignore the learning for evolution patterns.From Figure 1a, we can see the performance of existing TKG models drops along time, because models trained on historical data learn the old pattern which is not applicable to future data (You et al., 2021).
Moreover, as facts evolve constantly over time, there will be new entities continuing to emerge during the evolution of facts.Most of these new entities have little historical information and thus are hard to learn.As shown in Figure 1b, we can see that an extremely great number of future entities appear less than 50 times during evolution, and the performance of existing methods on such entities is much worse than others.Hence, improving the performance on entities with little historical information is necessary for prediction.However, most existing methods fail to address the issue, because they highly rely on sufficient historical information to obtain effective temporal embeddings of entities.
To deal with the aforementioned challenges, we propose a novel Temporal meta-learning framework for TKG reasoning, MetaTKG for brevity.MetaTKG is plug-and-play and can be easily applied to most existing backbones for TKG prediction.Specifically, MetaTKG regards TKG prediction as many temporal meta-tasks for training, and utilizes a Temporal Meta-Learner to learn the evolutionary meta-knowledge from these meta-tasks.Specially, each task consists of two KGs with adjacent time stamps in TKGs.In this way, the temporal meta-learner can learn the variation in evolution patterns of two temporally adjacent KGs, which can guide the backbones to learn to adapt quickly to future data with different evolution patterns.Besides, from the learning process of each task, the backbones can derive the meta-knowledge learned from entities with sufficient history to study how to obtain effective embeddings for entities with little historical information.Moreover, we specially design a Gating Integration module in temporal meta-learner according to the characteristics of TKGs, to adaptively establish the temporal correlations between tasks during the learning process.
To summarize, the major contributions can be listed as follows: • We illustrate and analyze the critical importance of learning the variations in evolution patterns of facts and new entities with little historical information on TKG reasoning.• We propose a novel meta-learning framework to learn the evolutionary meta-knowledge for TKG prediction, which can be easily plugged into most existing TKG backbones.• We conduct extensive experiments on four commonly-used TKG benchmarks and three backbones, which demonstrate that MetaTKG can greatly improve the performance.

Related Work
There are mainly two settings in TKG reasoning: interpolation and extrapolation.In this paper, we focus on the latter setting, which aims to predict future facts based on the given history.And we briefly introduce the applications of meta-learning in KGs and TKGs in this section.TKG Reasoning with extrapolation setting.Several early attempts such as GHNN (Han et al., 2020), Know-Evolve (Trivedi et al., 2017) and DyRep (Trivedi et al., 2019) build a temporal point process to obtain continuous-time dynamics for TKG reasoning, but they merely focus on continuous-time TKGs.Recently, RE-NET (Jin et al., 2020) models TKGs as sequences, utilizing R-GCNs and RNNs to capture structural and global temporal information, respectively.Following RE-NET, RE-GCN (Li et al., 2021) captures more complex structural dependencies of entities and relations, and utilizes static information to enrich embeddings of entities.Besides, CyGNet (Zhu et al., 2021) proposes a copy-generation mechanism, which utilizes recurrence patterns in history to obtain useful historical information.TANGO (Han et al., 2021b) applies the Neural Ordinary Differential Equations (NODEs) on TKGs to capture continuous temporal information.And CEN (Li et al., 2022) considers the length diversity of history and explores the most optimal history length of each dataset for prediction.Moreover, xERTE (Han et al., 2021a) designs an inference graph to select important entities for prediction, which visualizes the reasoning process.And TITer (Sun et al., 2021) presents a path-search model based on Reinforcement Learning for TKG prediction.However, all of these models are unable to well adapt to new data with diverse evolution patterns, and fail to learn entities with little historical information.
Meta-Learning in KGs.Meta-learning is regarded as "learning to learn" (Vilalta and Drissi, 2002), which aims to transfer the meta-knowledge so as to make models rapidly adapt to new tasks with some examples.Meta-learning has been wildly used in various fields.In KGs, meta-learning has also been verified its effectiveness.GMatching (Xiong et al., 2018) proposes the problem of fewshot relation in KGs, and applies a metric-based meta-learning to solve the issue by transferring the meta-knowledge from the background information to few-shot relation.Afterward, MetaR (Chen et al., 2019) presents a solution for few-shot relation in KGs following the meta-learning framework, which is independent of the background knowledge compared to GMatching.Recently, some works which follow GMatching to solve the problem of few-shot relation in KGs have emerged, such as FSRL (Zhang et al., 2020), FAAN (Zhang et al., 2020) and GANA (Niu et al., 2021).These works design different neighbor aggregation modules to make improvements on representation learning.However, all of these works are difficult to be adopted for TKGs.OAT (Mirtaheri et al., 2021) is the first attempt to utilize meta-learning to solve the problem of one-shot relation on TKGs, which performs on newly constructed one-shot TKG dataset.OAT generates temporal embeddings of entities and relations with a transformer-based encoder (Vaswani et al., 2017).And the utilization in OAT of meta-learning is similar to KGs, which divides tasks by relations.
Different from these meta-learning works mentioned above, our proposed method focuses on entities with little historical information in TKGs.Moreover, our model does not need to perform on special datasets, which is of more practical values compared to the models that only center on few-shot relations.Besides, though OAT utilizes the meta-learning method, the way it divides tasks still results in its inability to learn the variation in evolution patterns.

Preliminaries
In this section, we mainly formulate TKGs and the problem of TKG reasoning, then give brief notations used in this work.Definition 1 (Temporal Knowledge Graph).Let E and R represent a set of entities and relations.A Temporal Knowledge Graph (TKG) G can be defined as a temporal sequence consisting of KGs with different timestamps, i.e., G = {G 1 , G 2 , • • • , G n }.Each G t ∈ G contains facts that occur at time t.And a fact is described as a quadruple (e s , r, e o , t), in which e s , e o ∈ E and r ∈ R. Definition 2 (TKG Reasoning).The task of TKG Reasoning with extrapolation setting can be categorized into entity prediction and relation prediction.
The entity prediction task aims to predict the missing object entity of (e s , r, ?, t + 1) or the missing subject entity of (?, r, e o , t+1) Similarly, the relation prediction task aims to predict the missing relation of (e s , ?, e o , t + 1).In this paper, we mainly evaluate our models on the entity prediction task.Definition 3 (Backbones for TKG Reasoning).We denote TKG reasoning backbones as a parametrized function f θ with parameters θ.Definition 4 (Meta-task).In the traditional setting of meta-learning (Finn et al., 2017), the training process is based on a set of meta-tasks.We denote a meta-task as T t .Each meta-task consists of a support set T s t and a query set T q t , which can be denoted as T t = {T s t , T q t }.During the training process of each task T t , the backbones are first trained on T s t .Then the backbones are trained on T q t with the feedback from the loss of T s t .

Proposed Approach
In this section, we present the proposed MetaTKG in detail.The framework of our model is shown in Figure 2. In MetaTKG, TKG prediction is regarded as many temporal meta-tasks for training.
The temporal meta-learner in our model is built to learn evolutionary meta-knowledge from these meta-tasks.We aim to guide the backbones to learn to adapt quickly to future data and learn to deal with entities with little historical information by the learned meta-knowledge.Moreover, in the temporal meta-learner, the Gating Integration module shown in the middle part of Figure 2 is specially designed to adaptively capture the temporal correlation between tasks.

Temporal Meta-tasks
In TKGs, the temporal correlation between G t−1 and G t implies the evolutionary changes.Accord-

Meta-Knowledge
Figure 2: An illustration for MetaTKG.The TKG is firstly divided into many temporal meta-tasks ( §4.1) for training.Then the Temporal meta-learner ( §4.2) learns the evolutionary meta-knowledge from these tasks.Specially, according to the temporal characteristics of TKGs, we design a Gating Integration ( §4.2.1) module to establish the temporal correlations among these tasks.During the learning process of each task, the knowledge gained from former ones will be fused and then transferred to the latter task.Finally, the obtained meta-knowledge from different tasks guides the backbone to learn to quickly adapt to new data for prediction over TKGs.
ing to this, we can learn the variation in evolution patterns by learning the evolutionary information in G t−1 and G t .Thus, in our model, we regard TKG prediction as many temporal meta-tasks T t for training.Each task is designed to consist of two KGs in G with adjacent time stamps, which is more applicable to the TKG scenario.Formally, a temporal meta-task T t can be denoted as: where G t−1 is the support set T s t and G t is the query set T q t .That is, our training data can be described as T train = {T t } k t=2 , where each task T t corresponds to an individual entity prediction task.Similarly, the validation data and testing data are also composed of temporal meta-tasks, which can be denoted as T valid = {T t } m t=k+1 and T test = {T t } n t=m+1 , respectively.

Temporal Meta-learner
To learn the evolutionary meta-knowledge from the temporal meta-tasks, inspired by the learning ability of meta-learning in the time series scene (You et al., 2021;Xie et al., 2022), we design a novel temporal meta-learner according to the characteristics of TKGs.The goal of the temporal meta-learner is to guide the backbones quickly adapt to future data by the learned meta-knowledge.
In specific, during the learning process of each T t , we first train the backbones on the support set T s t .And the updated parameter is computed by using one gradient update.Formally, this process can be defined as follows, where θ t is the updated parameter on the support set T s t , and θ s t represents the initial parameter for training the backbones on T s t .α is a hyper-parameter, which is for controlling the step size.
After obtaining θ t updated on T s t and the feedback from the loss of T s t , we train the backbones on the query set T q t by where θ t is the updated parameter on the query set of T t .And θ q t represents the initial parameter for training the backbones on T q t .β is a hyperparameter for controlling the step size.It is to be noted that we learn one meta-task at a time, the initial parameter for learning each T t is the updated parameter by previously learned meta-tasks, which will be further described in detail in §4.2.1.
By utilizing such training strategy, the evolutionary meta-knowledge learned from each T t can guide the backbones to gradually learn to face new data (the query set T q t ) with different evolution patterns from the old data (the support set T s t ).In this way, by continuously learning these tasks one by one, the backbones can learn to quickly adapt to new data by the accumulated meta-knowledge from different tasks.Meanwhile, the meta-knowledge also guides the backbones to learn entities with little historical information with the learning experience from other entities.

Gating Integration
Due to the temporal characteristics of TKGs, metatasks in TKG scenario are temporally correlated.That is, the meta-knowledge learned from former meta-tasks is helpful for learning the next one.Thus, in the learning process of meta-tasks, the temporal correlation between them is necessary to be considered.Since such correlation is actually generated by the temporal correlation between KGs with adjacent time stamps in TKGs, the key to establishing the correlation between meta-tasks is to associate temporally adjacent KGs in different tasks.
Considering the importance of establishing the temporal correlations, we specially design a gating integration module to effectively build up the temporal correlations between tasks.
Specifically, for the support set of each task, we fuse the updated parameter vector by task T t−1 and T t−2 , taking the fused one as the initial parameter of task T t for learning.Formally, the initial parameter θ s t in Eq (2) can be calculated as, where g s is a learn-able gate vector to balance the information of θ t−1 and θ t−2 .σ(•) is the Sigmoid function to project the value of each element into [0, 1], and denotes element-wise multiplication.
It is important to note that the parameter g s is updated with the loss of the support set T s t in Eq (2).Formally, By initializing θ s t with the gating module, we can achieve building up temporal correlations for the support set in each task.As shown in the gating integration module of Figure 2, θ s t operated on G t−1 (the support set of T t ) can contain knowledge in θ t−2 learned from G t−2 (the query set of T t−2 ).Finally, the temporal correlations can be established by temporally associating G t−1 and G t−2 in different tasks.More discussions on temporal correlation for T s t can be seen in Appendix A. Different from the support set, we simply take the updated parameter θ t−1 by T t−1 as the initial parameter θ q t in Eq (3) for learning T q t : Noted that the query set T q t = G t of T t is temporally adjacent with the query set T q t−1 = G t−1 of task T t−1 .Thus, by simply initializing θ q t with θ t−1 which contains knowledge learned from G t−1 (the query set of T t−1 ), we can establish temporal correlation without gating for T q t by associating G t and G t−1 from different tasks.

Component-specific gating parameter
In particular, the gating parameter g s in the gating integration module is designed to be componentspecific, because we consider that parameters in different components should be updated with different frequencies.For example, the parameter of entity embeddings is updated only when the entity appears, the parameter of relation embeddings is updated more frequently, and other parameters in models need to be updated in every meta-task.Thus, we assign different g s values to entity embedding, relation embedding, and other parameters in the model, respectively.

Model Evaluation
After training the backbones on all the tasks in T train , we can obtain the final updated parameter θ train by T train for testing.The process of testing is similar to training.Specially, to ensure the consistency in time series, we update the parameter θ train on T valid before testing.Moreover, to enhance the ability of fast adaptions for the backbones plugged with our method, we conduct multi-step gradient updates in the testing phase (Finn et al., 2017).Algorithm 1 provides the pseudo-code of the overall framework.

Experiment
In this section, we conduct experiments to evaluate MetaTKG on four typical datasets of temporal knowledge graph and three backbones for TKG prediction.The implementation details can be seen in Appendix B. Then we answer the following questions through experimental results and analyses.
• Q1: How does the proposed MetaTKG perform when plugged into existing TKG reasoning models for the entity prediction task?The details of the four datasets are presented in Table 1, where the time gap represents time granularity between two temporally adjacent facts.

Backbones
Since MetaTKG is plug-and-play, we plug it into several following state-of-the-art TKG reasoning models to evaluate the effectiveness of our model.RE-NET (Jin et al., 2020) deals with TKGs as KG sequences.RE-NET utilizes the RGCN to capture structural dependencies of entities and relations within each KG.Then RNN is adopted to associate KGs with different time stamps for capturing the temporal dependencies of entities and relations.
RE-GCN (Li et al., 2021) proposes a recurrent evolution module based on relational GNNs to obtain the embeddings which contain dynamic information for entities and relations.Specially, RE-GCN designs a static module that utilizes the static properties of entities to enrich the embeddings for prediction.
CEN (Li et al., 2022) takes the issue of length diversity in TKGs into consideration.CEN utilizes an RGCN-based encoder to learn the embeddings of entities with different history length, and adopts a CNN-based decoder to choose the optimal history length of each dataset for prediction.

Evaluation Metrics
For evaluating our model, we adopt widely used metrics (Jin et al., 2020;Li et al., 2021), MRR and Hits@{1, 3, 10} in experiments.To ensure the fairness of comparison among models, we unify the setting that the ground truth history is utilized during the multi-step inference for all models.Without loss of generality (Li et al., 2021), we only report the experimental results under the raw setting.

Performance Comparison (RQ1)
Since our model utilizes multi-step updates in the testing phase, we also fine-tune all backbone models with multi-step gradient updates for a fair Table 2: Performance comparison of MetaTKG when plugged into different backbones on four datasets in terms of MRR (%), Hit@1 (%), and Hit@10 (%) (all results are under raw metrics).The highest performance is highlighted in bold.The backbones with * represent the fine-tuned one.∆Improve and ∆Improve* indicate the relative improvements over the original backbones and fine-tuned backbones in percentage, respectively.Model ICEWS14 ICEWS18 ICEWS05-15 WIKI MRR Hit@1 Hit@10 MRR Hit@1 Hit@10 MRR Hit@1 Hit@10 MRR Hit@1 Hit@10 comparison.The performances of the backbones plugged with MetaTKG, the original backbones, and fine-tuned backbones on entity prediction task are shown in Table 2. From the results in Table 2, we have the following observations.Our proposed method can provide significant improvements on the performance of backbones under most metrics on all datasets, which verifies the effectiveness of our method.For one thing, our method greatly enhances the performance of the original backbones, which indicates that our method can effectively help the backbones learn to adapt quickly to future data with various evolution patterns and alleviate the issue on learning entities with little historical information.For another, though fine-tuning could benefit the performance of backbones for facing new data to some degree, the backbones plugged with our model still significantly outperform the fine-tuned ones.This further illustrates the advantages of our method for learning variations in diverse evolution patterns.
Observing the results of relative improvements over the original backbones, we can find that the improvements on ICEWS05-15 are greater compared with ICEWS14, ICEWS18, and WIKI.
It is worth noting that the number of time slices in ICEWS05-15 is much more than other datasets and facts in ICEWS05-15 last shortly (Han et al., 2021b), which implies that these facts may exhibit more diverse evolution patterns and evolve much faster in such a long time.Thus, we analyze the reason for the greater improvements on ICEWS05-15 is that the problem arising from diverse evolution patterns in ICEWS05-15 is more serious than other datasets.Moreover, though the fine-tuned backbones can achieve great improvements on ICEWS15, our model still outperforms the fine-tuned ones.

Performance Comparison on Predicting
Facts in Different Time Periods (RQ2) To verify the effectiveness of our model in solving the problem brought by diverse evolution patterns, we evaluate the performance of backbones in different periods on ICEWS14 and ICEWS18.Specifically, we divide ICEWS14 and ICEWS18 into four time periods in chronological order, respectively.
In Figure 3, we represent the relative improvements of the backbone plugged with MetaTKG on each period to the original backbone and the fine-tuned one.From the results shown in Figure 3, we can have the following observation.
Compared with the original backbone, we can see that the relative improvement increases over time periods.Combined with Figure 1a, we can infer that the facts in farther periods are more difficult to predict because they exhibit more different distributions than the training set.But our model can help the backbones keep effective over time because the backbones can obtain meta-knowledge from our model by training different temporal metatasks, which guides the backbones to learn how to adapt quickly to future data with the knowledge from old data.Moreover, compared with the fine- tuned backbone, we still can obtain the same observation described above.This illustrates that our method is much more effective in learning the variations in diverse evolution patterns, which further verifies the effectiveness of MetaTKG.

Performance Comparison on Predicting
Entities with Little Historical Information (RQ3) We also conduct an experiment to verify the tiveness of our model in solving the issue brought by entities with little historical information.Specifically, we divide the test sets ICEWS14 and ICEWS18 into several groups according to the number of historical interactions of entities, respectively.Different groups contain entities with different historical interaction numbers.From the results in Figure 4, we have the following observation.
Compared with both the original backbone and the fine-tuned one, the relative improvement in the group [0, 50] is much higher than in the other groups.This indicates that both the original backbone and the fine-tuned one perform poorly on entities with little historical information, and our model is indeed effective for solving the issue of entity sparsity.We analyze the reason may be that during the training process of meta-tasks, the backbone learns to utilize the knowledge gained from  entities with adequate historical information when facing entities with little historical information.

Ablation Studies (RQ4)
We conduct two experiments to investigate the superiority of the gating integration module in MetaTKG and the effectiveness of the componentspecific design for the gating parameter.
Ablation Study for Gating Module.To verify the effectiveness of the gating integration module, we compare the performance of the backbones plugged with MetaTKG and MetaTKG-G which removes the gating module.We show the results of all backbones plugged with the two models in Table 3 and obtain the following findings.
Firstly, all backbones plugged with MetaTKG outperform MetaTKG-G on most evaluation metrics, which confirms that the gating integration module can effectively enhance the performance on the entity prediction task.This illustrates the importance of capturing temporal correlation in TKGs scenario by the gating module.Secondly, in the results of all datasets for each backbone, the relative improvement on WIKI is lower compared with other datasets.For WIKI, it is worth noting that facts in WIKI last much longer and do not occur periodically (Han et al., 2021b), which implies  Ablation Study for Component-specific gating parameter.The gating parameter is the key to determining the performance of the gating integration module.In this experiment, we compare the performance of the backbones plugged with MetaTKG and MetaTKG-C, in which the gating parameter g s is set to the same value for entity embedding, relation embedding, and other parameters in the model.We show the results of all backbones plugged with the two models in Table 4.
Through the relative improvements, we can find that assigning different g s values to entity embedding, relation embedding and other parameters in the model can achieve more superior performance on all datasets with different backbones.This demonstrates the robustness and effectiveness of our g s design.
5.6 Analysis on the effect of the number of gradient update steps (RQ5) In this section, we study how the multi-step gradient updates affect our model.The performances of backbones plugged with MetaTKG under differ- ent multi-step values are shown in Figure 5.We can find that the multi-step updates can enhance the performance of backbones on most datasets.However, compared with the other three datasets, the multi-step updates seem to be less effective on ICEWS05-15.From Table 1, we can find that the number of time slices in ICEWS05-15 is much more than other datasets, but the total number of facts is not larger than others, which indicates that the number of facts in ICEWS05-15 in each time slice is relatively less.Thus, we analyze the reason for the aforementioned observation is that the multi-step updates make the models susceptible to over-fitting when predicting each time slice.

Conclusion
In this paper, we have proposed a novel temporal meta-learning framework MetaTKG for TKG reasoning, which can be easily served as a plugand-play module for most existing TKG prediction models.MetaTKG regards TKG prediction as temporal meta-tasks, and utilizes a Temporal Meta-Learner to learn the evolutionary meta-knowledge from these tasks.MetaTKG aims to guide the backbones to adapt quickly to future data and enhance the performance on entities with little historical information by the learned meta-knowledge.Specially, in the temporal meta-learner, we develop a Gating Integration module to establish temporal correlations between tasks.Extensive experiments on four benchmarks and three state-of-the-art backbones for TKG prediction demonstrate the effectiveness and superiority of MetaTKG.

Limitations
In this section, we discuss the limitation of our model.Specifically, we utilize multi-step gradient updates in the testing phase to enhance the ability of fast adaption for the backbones plugged with our model.In this way, the effectiveness of our model can be more significantly improved compared with one gradient update, but the requirement of GPU resources becomes larger than the original backbones.Moreover, since the model acts on data in one time slice at a time for TKG prediction, multistep gradient updates tend to cause over-fitting on the dataset in which the number of data in each time slice is small, such as ICEWS05-15.

Figure 1 :
Figure 1: (a) The y-axis is the performance (MRR (%)) of existing TKG models RE-NET and RE-GCN on predicting facts in different time periods.(b) The left yaxis is the performance of RE-NET on predicting entities with different historical interaction numbers.And the right y-axis is the number of entities in each group.The results of both (a) and (b) are obtained based on the widely used TKG dataset ICEWS14.

Figure 3 :
Figure 3: The relative improvements (MRR %) of backbones plugged with MetaTKG over the original and the fine-tuned ones on predicting facts in different periods.The backbone with * represents the relative improvements (%) over the fine-tuned one.

Figure 5 :
Figure 5: Effect of different gradient update step numbers in the testing phase.The y-axis is MRR value.And the represents the different number of gradient update steps.
Algorithm 1: Training procedure Input: T train : {T t } k t=2 ;Initial parameters θ, g s .Output: The trained parameters θ train , g s . 1 Initialize θ and g s randomly;

Table 1 :
The statistics of the datasets.

Table 3 :
Ablation studies on gating integration module in terms of MRR (%) under the raw setting.

Table 4 :
Ablation studies for component-specific gating parameter in terms of MRR (%) under the raw setting.