Re-Temp: Relation-Aware Temporal Representation Learning for Temporal Knowledge Graph Completion

Temporal Knowledge Graph Completion (TKGC) under the extrapolation setting aims to predict the missing entity from a fact in the future, posing a challenge that aligns more closely with real-world prediction problems. Existing research mostly encodes entities and relations using sequential graph neural networks applied to recent snapshots. However, these approaches tend to overlook the ability to skip irrelevant snapshots according to entity-related relations in the query and disregard the importance of explicit temporal information. To address this, we propose our model, Re-Temp (Relation-Aware Temporal Representation Learning), which leverages explicit temporal embedding as input and incorporates skip information flow after each timestamp to skip unnecessary information for prediction. Additionally, we introduce a two-phase forward propagation method to prevent information leakage. Through the evaluation on six TKGC (extrapolation) datasets, we demonstrate that our model outperforms all eight recent state-of-the-art models by a significant margin.


Introduction
A Knowledge Graph (KG) is a graph-structure database, composed of facts represented by triplets in the form of (Subject Entity, Relation, Object Entity) such as (Alice, Is a Friend of, Bob).In this graph, entities serve as nodes, and relations are depicted as direct edges connecting the nodes.However, facts in a KG are not static but undergo continuous updates over time.To incorporate temporal information into the KG, Temporal Knowledge Graphs (TKGs) are introduced.TKGs add the extra temporal information of each fact and extend each triple with a timestamp as a quadruplet (Subject Entity, Relation, Object Entity, Timestamp).A TKG can be represented as a sequence of snapshots, where each snapshot represents a static knowledge graph for one specific timestamp.
Temporal Knowledge Graph Completion (TKGC) aims to predict the missing entity from a query (Subject Entity, Relation, ?, Timestamp) or (?, Relation, Object Entity, Timestamp).TKGC is difficult and even large-scale pre-trained language models such as ChatGPT (OpenAI, 2022) are prone to making factual errors (Borji, 2023).There are two main settings: interpolation and extrapolation setting.TKGC under the interpolation setting completes the facts in history, while TKGC under the extrapolation setting predicts facts at future timestamps.In this paper, we focus on TKGC in the extrapolation setting, which is more challenging and requires further improvement (Jin et al., 2020).
Enormous attention has been focused on static KGC problems, and numerous models have been employed to encode entities and relations.However, a key question remains: how can a static KGC model be extended to incorporate temporal infor-mation for TKGC tasks?Recent works (Jin et al., 2020;Li et al., 2021Li et al., , 2022a,b) ,b) have utilised sequential Graph Neural Networks (GNNs) to the previous snapshots for encoding the entities and relations.Then, they use a static score function as the decoder to assess the score of each candidate.Sequential GNNs are used because the facts shown in recent history can be helpful when making predictions in the future.An example is shown in Figure 1, the previous facts (Kim Jong-Un, criticize, United States) three days before and (Kim Jong-Un, Make Statement, Donald Trump) one day before may imply (Donald Trump, Threaten with administrative sanction, Kim Jong-Un) today.Since no explicit timestamp value is used, we can call it "implicit temporal information".
However, to effectively encode the timestamp, the temporal information, two additional considerations arise: First, explicit temporal information is crucial.For instance, the validity score of (Donald Trump, Threaten with administrative sanction, Kim Jong-un) may differ between 2018 and 2023 as Donald Trump was the president in 2018 but not in 2023, affecting his ability to threaten another nation with administrative sanctions in 2023.The nature of entities can change over time, necessitating the consideration of explicit temporal information to encode time-dependent factors.Second, not all the facts in the recent history are relevant.Given historical facts (Kim Jong-un, criticize, United States, 2018-08-01), (Donald Trump, Make a visit, Switzerland, 2018-08-02) and (Kim Jong-un, Make Statement, Donald Trump, 2018-08-03), when calculating the score of (Donald Trump, Threaten with administrative sanction, Kim Jongun, 2018-08-01), the second quadruplet visiting Switzerland does not contribute to the prediction of the relation between Donald Trump and Kim Jong-un since Switzerland is neutral.In such case, the model should find a way to skip the irrelevant snapshots based on the entity-related relation in the query.Therefore, an optimal TKGC model should consider (1) explicit temporal information and (2) implicit temporal information with skipping irrelevant snapshots by considering the query.
In this paper, we propose Re-Temp, an innovative TKGC model designed for extrapolation settings that incorporates relation-aware temporal representation learning.The encoder of Re-Temp utilises explicit temporal embedding for each entity, combining static and dynamic embedding.Within the encoder, a sequential GNN is employed to capture the implicit temporal information with a skip information flow applied after each timestamp, taking into account the entity-related relation in the query.The main contributions of this paper can be summarised as follows: • We introduce Re-Temp, a precise TKGC model that leverages both explicit and implicit temporal information, incorporates a relationaware skip information flow to exclude irrelevant information and adopts a two-phase forward propagation method to prevent information leakage1 .
• We compare our Re-Temp against eight stateof-the-art baseline models from recent years using six publicly available TKGC datasets under the extrapolation setting.Our experimental results demonstrate that Re-Temp outperforms all of the baselines significantly.
• We conduct a detailed case study and statistical analysis to illustrate the distinct characteristics of each dataset and provide an explanation based on our experimental findings.

Related Work
KGC models normally adopt an encoder-decoder framework (Hamilton et al., 2017), where the encoder generates the embedding of entities and relations and the score function plays as a decoder.Most of the existing works extend the static KGC models into TKGC models by introducing temporal information.

TKGC(Interpolation)
To integrate the temporal information in the decoder, TTransE (Jiang et al., 2016) extends TransE (Bordes et al., 2013) with the summation of an extra timestamp embedding, and ConT (Ma et al., 2019) extends Tucker (Balažević et al., 2019) by replacing the learnable weight with the timestamp embedding.Some methods also focus on combining temporal information in the encoder: TA-DistMult (Garcia-Duran et al., 2018) encodes the temporal information into relation embedding by using LSTM, while DE-SimplE (Goel et al., 2020) encodes a diachronic entity embedding with temporal information.with decoders as DistMult and Table 1: Summary of TKGC(extrapolation) models and our proposed model.The column'Temporal' presents the trend of the approach to how the temporal information is used, and the column 'Query' shows the summary of the approach to how the model utilises query.

Method
Core idea Temporal Query RE-NET (Jin et al., 2020) estimate the future graph distribution implicit N/A CyGNet (Zhu et al., 2021) identify facts with repetition explicit repetitive queries xERTE (Han et al., 2020) sample subgraph according to query implicit query-related subgraph REGCN (Li et al., 2021) relation-GCN + GRU implicit N/A TANGO (Han et al., 2021) neural ODE on continuous-time reasoning implicit N/A TITER(Haohai Sun, 2021) path-based reinforcement learning implicit query-related path CEN (Li et al., 2022a) ensemble model with different history lengths implicit N/A HiSMatch (Li et al., 2022b) two separated encoders for entity and query information implicit repetitive queries Re-Temp (Ours) skip irrelevant information according to entity-related relations both query-related skip information flow SimplE (Yang et al., 2015;Kazemi and Poole, 2018) accordingly.These models produced relatively lower performance on TKGC under the extrapolation setting tasks since they are unable to capture unseen temporal information.

TKGC(extrapolation)
For the last few years, more attention has been paid to TKGC tasks under the extrapolation setting.GNNs are typically used as the encoder: RE-NET (Jin et al., 2020) applies sequential neighbourhood aggregators such as R-GCN (Schlichtkrull et al., 2018) to get the distribution of the target timestamp snapshot, REGCN (Li et al., 2021) adopts CompGCN (Vashishth et al., 2020) at each timestamp and GRU for sequential information.CEN (Li et al., 2022a) uses an ensemble model of sequential GNNs with different history lengths, TANGO (Han et al., 2021) solves Neural Ordinary Equations and makes it as the input of a Multi-Relational GCN, and HiSMatch (Li et al., 2022b) builds two GNN encoders modelling the sequential candidate graph and query-related subgraphs separately and combines the representation from both sides into a matching function.Meanwhile, some methods do not follow the traditional encoder and decoder framework.xERTE (Han et al., 2020) extracts subgraph according to queries, CyGNet (Zhu et al., 2021) identifies the candidates with repetition, and TITer(Haohai Sun, 2021) uses reinforcement learning methods to search for the temporal evidence chain for prediction.To conclude, RE-NET, REGCN, and CEN adopt the entity evolvement information, while xERTE, CyGNet and TITer focus on the query.HiSMatch combines these two types of information with two separate encoders.However, none of the previous works encoded sequential and query-related information in one precise encoder.In addition to this, none of these methods considers explicit temporal information, except for CyGNet, which generates an independent timestamp vector but does not encode it into the entity or relation.Table 1 presents the summary of TKGC(extrapolation) models and emphasises the contribution of our proposed model.

Re-Temp
The overall architecture of Re-Temp can be found in Figure 2. Section 3.1 describes the notations of a TKGC task.The input of the model is represented by a combination of static and dynamic entity embedding, in Section 3.2, showing explicit temporal information.The encoder in Section 3.3 uses a sequential multi-relational GNN to learn implicit temporal information and after each timestamp, a relation-aware skip information flow mechanism is applied to retain the necessary information for prediction.The ConvTransE decoder together with the loss function is introduced in Section 3.4.To avoid information leaking, we apply a two-phase forward propagation method in Section 3.5.

Problem Formulation
To denote the set of entities, relations, timestamps and facts, E, R, T and F are selected.A temporal knowledge graph G can be treated as For each fact, a quadruplet is represented as (e s , r, e o , t), where e s , e o ∈ E are the subject and object entities, r ∈ R represents the relation and t ∈ T is the timestamp.The target of the temporal knowledge graph completion under the extrapolation setting is that for a query q, predicting (e s , r, ?, t q ) or (?, r, e o , t q ) given previous snapshots {G 0 , G 1 , ..., G tq−1 }.Normally, the inverse of each quadruplet is added to the dataset, making all subject entity prediction prob- lem (?, r, e o , t q ) into object entity prediction problem (e o , r −1 , ?, t q ).

Explicit Temporal Representation
For sequential snapshots with length k, let h eq tq−k ∈ R 1×d denotes the input embedding of the subject entity e q from query q, and d is the dimension of the input.In order to encode the explicit temporal information, we concatenated two kinds of input embedding; static and dynamic embedding.The static embedding reveals the nature of an entity that does not change through time, while the dynamic part reveals the time-dependent information.
Inspired by ATiSE (Xu et al., 2020), the dynamic embedding is decomposed into the trend component and seasonal component, and the trend component can be represented as a linear transformation on t while the seasonal component should be a periodical function of t.Thus, we model the dynamic temporal embedding at timestamp t by the summation of trend embedding w eq,0 t and seasonal embedding sin(2πw eq,1 t).After concatenation with the static embedding, a feed-forward layer is applied.Formally, the input of the encoder h eq tq−k is derived by: h eq,S tq−k = h eq,S (1) where h eq,S tq−k in Equation 1 and h eq,D tq−k in Equation 2 denote the static and dynamic embedding for subject entity e q at timestamp t q − k, ⊕ denotes the concatenation, and h eq,s , w eq,0 , w eq,1 , W tmp are learnable parameters.The major difference between our explicit temporal representation and ATiSE lies in the fact that employing a learnable feed-forward layer to concatenate the dynamic embedding and static embedding, enables the model to determine the extent to which it should utilise information from each embedding rather than simply utilising both.Relation embedding h r can simply be extracted from a static embedding lookup table since we do not expect the relation's nature to evolve through time.

Relation-Aware Skip Information Flow
In order to handle implicit temporal information, we use a sequential GNN-based encoder with a new relation-aware skip information flow mechanism.Following recent work (Li et al., 2021(Li et al., , 2022a,b),b), we adopt a variant of CompGCN (Vashishth et al., 2020) at each timestamp to model the multirelational snapshot, outputting the entity embedding h e and the relation embedding h r .The details of CompGCN are shown in Appendix A.1.
Not all snapshots in the recent history are useful in predicting query q, hence, a relation-aware skip information flow is applied.Two things are considered: (1) Skip connection is used for filtering out the unnecessary information from each timestamp.
(2) Relation-aware attention mechanism helps to determine whether some information should be filtered.Thus, after getting the output of CompGCN, they will be weighted-summed up with previous timestamps input to partially skip the irrelevant snapshots.The weights of the weighted sum are calculated by considering both the entity and the entity-related relation in the query.
Formally, for an entity e q , the relation associated with e q should be considered.To capture the entity-related relation information, mean pooling is applied on all relation embedding associated with e q at timestamp t q .The representation obtained from mean pooling will serve as a reference vector to help the model determine the information to keep or skip.Then, this average relation embedding will be summed with all m previous timestamps one by one, followed by a feedforward layer.This calculation can also be treated as additive attention.After getting the attention weights β eq j , the weighted sum using these attention weights is applied on the current CompGCN output h eq,L t i and all m previous timestamp inputs.The detailed calculation shows as follows: (4) (5) Note that the output of each timestamp is also the input of the next timestamp.Equation 4shows the entity-associated relation embedding and R eq tq denotes the relation set which connects with entity e q at timestamp t q .Equation 5 and 6 denotes the attention score and weight calculation where W a is learnable.By applying the relation-aware skip information flow, our model is capable of skipping irrelevant snapshots by considering the target query relations.

Decoder
ConvTransE (Shang et al., 2019) is widely used in both static KGC (Malaviya et al., 2020) and TKGC (Li et al., 2022b) as the score function, and ours is no exception.After getting the score of each candidate using ConvTransE, we train the model as a classification problem and the loss function for each query shows as follows: and z c will be 1 if correctly classified, otherwise, it is 0. The training target is to minimise the total loss for all queries.Appendix A.2 introduces the details of ConvTransE.

Two-Phase Propagation
There is a potential information leakage problem by applying the relation-aware information flow mechanism.Suppose a query in the test set is (A, r, B, t), after adding the inverse of quadruplets, (B, r −1 , A, t) will be in the test set.When applying the encoder, with the relation-aware skip information flow, A and B will contain the information of r and r −1 accordingly.Therefore, when making predictions on (A, r, ?, t) and calculating the score by dot product A and all candidates, there is a chance that the information of r in A can meet the information of r −1 in B. Since r and r −1 are paired, the model might find a shortcut to determine B is the right answer for (A, r, ?, t).This information leakage will result in unreasonably high performance during evaluation.
To avoid such information leakage, we propose a two-phase forward propagation method.We divide the dataset into two subsets: the original set and the inverse set.The inverse set is the set of inverse quadruplets.The snapshot graph in the history will be built on the whole set, while during forward propagation, the original set and inverse set are used separately.The output of the original set and the inverse set will be collected for loss calculation or performance evaluation.
Baselines Our Re-Temp is compared with TKGC models under the extrapolation setting.Eight models from recent years are selected as baselines: RE-NET (Jin et al., 2020), RE-GCN (Li et al., 2021), CyGNet (Zhu et al., 2021), xERTE (Han et al., 2020), TITer(Haohai Sun, 2021), TANGO (Han et al., 2021), CEN (Li et al., 2022a), and HiS-Match (Li et al., 2022b).Models that are designed for static KG completion or TKGC under the interpolation setting tasks are not compared since they naturally perform badly in TKGC under the extrapolation setting tasks.
Hyperparameter Following the previous works (Li et al., 2022a,b), the dimension of the input is set to 200, which is also the hidden dimension of the graph model and decoder hidden dimension.The number of graph neural network layers is 2 and the dropout rate is set to 0.2.Adam(Kingma and Ba, 2015) with a learning rate of 1e-3 is used for optimisation.The model is trained on the training set with a maximum of 30 epochs and we stop training when the validation performance doesn't improve in 5 consecutive epochs.Then, the test set is evaluated using the trained model.
Evaluation Metrics Following the previous works (Han et al., 2020;Zhu et al., 2021;Li et al., 2022b), we employ widely used evaluation metrics, Mean Reciprocal Rank(MRR), hits@1, hits@3, and hits@10, which is explained in Appendix B.2.We adopt the way of filtering out the quadruplets occurring at the query time, followed by Haohai Sun (2021); Han et al. (2021), and we report the fivetimes running average result.

Performance Comparison
We use a history length of 3 for ICES14, ICEWS18, ICEWS05-15, ICEWS14* and GDELT, while 1 for WIKI.The influence of history length is discussed in Section 4.3.Table 3 presents the performance comparison of all baseline models.Our model, Re-Temp, outperforms significantly almost all the baseline models on all datasets, indicating the superiority of our Re-Temp model.In detail, three points can be observed: Firstly, HiSMatch (Li et al., 2022b) achieved the second-highest performance on most of the datasets by considering both the query subgraph and entity subgraph.The concept considering both query and entity of HiSMatch is similar to our relationaware attention mechanism in the skip information flow.However, HiSMatch only builds the query subgraph using the exact same relation of the query, which ignores the potential similarity between relations.For example, in ICEWS14, when making a prediction on (A, provide_aid, ?, t q ), relation 'provide_aid' and 'provide_military_aid' share similarities, but HisMatch only considers the entity with 'provide_military_aid' in the recent history while our method uses the embedding of relation to calculate the attention weights, making it gen- ICEWS05-15 MRR hits@1 hits@3 hits@10 MRR hits@1 hits@3 hits@10 MRR hits@1 hits@3 hits@10 RE-NET (Jin et al., 2020) 37 Model ICEWS14* GDELT WIKI MRR hits@1 hits@3 hits@10 MRR hits@1 hits@3 hits@10 MRR hits@1 hits@3 hits@10 RE-NET (Jin et al., 2020) 38  eral for different types of relations that are close in the embedding space and outperforming HiS-Match.Meanwhile, HiSMatch builds two separate encoders and fuses the output for the decoder while our model only applies one encoder for better information alignment.Secondly, among four ICEWS datasets, our model achieves more improvement on ICEWS05-15.As shown in Table 2, the snapshots in ICEWS05-15 are sparser than others, showing the ability of our model to learn sequential information with less data.
Thirdly, our model only achieves a comparable performance with HiSMatch on WIKI, which might result from the nature of this dataset.Table 4 lists some cases of facts about Lionel Messi in WIKI.Suppose giving the quadruplets from 2003 and 2004, it is relatively easy to predict (Lionel Messi, residence, ?, 2005) based on his previous residence, however, it is almost impossible to have a correct prediction on (Lionel Messi, residence, ?, 2005) since the previous snapshots don't provide enough information on Argentina national football team.This is an issue in WIKI: the predictions are either too easy (using the previous facts), or too difficult (even humans can not make a correct prediction without any external knowledge).Thus, a relatively better model is not enough to generate an undoubtful better performance on WIKI, and our model and some previous baseline models (CEN, HiSMatch) share similar results on this dataset.

Impact of history length
To study the impact of history length on different datasets, experiments with different history lengths are conducted.The default value of history length is 3 and the MRR changes in percentage are shown in Figure 3 with history lengths from 1 to 5. Two  major points can be noticed: (1) On most of the datasets (ICEWS14, ICEWS18, ICEWS05-15, ICEWS14*, and GDELT), a larger history length results in a higher MRR.Where the history length is small, enlarging the history length can substantially enhance performance.
However, when the history length surpasses three, the degree of improvement becomes marginal.This aligns with the expectations that the recent several snapshots can help with inference, while in a long history, the irrelevant information does not contribute to the performance.By considering the model performance and calculation complexity, history length = 3 is selected as the final model for these datasets.
(2) An exception occurs on WIKI, where the model achieves the best performance when history length = 1.To investigate the factors, a detailed statistical analysis of the datasets is conducted.Table 4 in Section 4.2 shows some sample queries in WIKI, where some facts are the same as the facts at previous timestamps, the reason lies in that for a fact (s,r,o,t 1t n ), WIKI generates the same quadruplets across the time range from t 1 to t n .Figure 4 shows the proportion of the quadruplets at t q shown in the previous timestamp t q − 1 for all timestamps in the test set on each dataset.85.68% samples in the WIKI show in the one timestamp before, while fewer than 15% samples in ICEWS14, ICEWS18, ICEWS05-15, ICEWS14*, GDELT are from the previous timestamp.The same quadruplets shown across different timestamps in WIKI result in similar snapshots(graphs) at different timestamps.When a larger history length is applied, multiple graph neural network models applied on multiple similar graphs will be approximated to applying a multiple layers GNN model on one graph, which leads to the over-smoothing issue in a deep GNN (Li et al., 2018).Therefore, a large history length may decrease model performance on WIKI.

Ablation Study
Table 5 presents the ablation study of different components of our model.
Impact of explicit temporal embedding To evaluate the efficiency of the explicit temporal representation, we remove the dynamic embedding from the explicit temporal input, resulting in only the static embedding of each entity left.For all six benchmark datasets, removing dynamic embedding leads to worse performance.Compared with the performance drop in ICEWS14, ICEWS18, ICEWS14* and GDELT, it is clear that the MRR decreases more in WIKI and ICEWS05-15.The reason is that the total time range in these two datasets is large (232 years and 11 years), and the entity information can evolve over a long period, which can be captured by explicit temporal embedding.
Impact of relation-aware skip information flow To demonstrate how the relation-aware skip information flow contributes to the model performance, two ablation tests are conducted.(1)'-relation_aware' means that when calculating the attention score in skip information flow, the entityrelated relation is omitted, formally, the attention score is Equation 5 is changed to:attn eq j = W a (h (2)'-skip' means removing the whole skip information flow, making the input of each timestamp the last timestamp the output: The model performance drops heavily if no relation-aware attention mechanism is applied, showing the vital importance of the relation-aware attention mechanism.We can conclude that the entity-related relation information actually helps the model to select necessary information.In most cases, removing the skip connection worsens the model performance compared with only removing the relation-aware attention mechanism.Compared with '-relation_aware' setting, the models under the '-skip' setting learn from all the recent snapshots for prediction, leading to the involvement of irrelevant information during prediction.However, WIKI shows better performance under this setting, even compared with our original Re-Temp model.The reason might be the same as that discussed in Section 4.3: More than 80% of facts in the WIKI show in the previous timestamp, and a graph model applied on the previous timestamp can easily capture that repetitive information for prediction.

Ensemble Modelling Evaluation
CEN (Li et al., 2022a) builds an ensemble model with different history lengths.Inspired by this, we test our model under an ensemble setting.For a model with a history length of k, suppose the score vector of all candidates for query q is s q k , a pooling method is applied on {s q 1 , s q 2 , ..., s q k } to get the final score.Three different pooling methods are applied.Table 6 shows the MRR(%) results of our model under the ensemble setting.We applied the history lengths from one (1), and the maximum history length is set to three (3) as previously defined.We did not include the experiments on WIKI since the optimal history length is one (1), and no models with smaller history lengths can be used.First of all, our model can benefit under the ensemble setting on four of the datasets (ICEWS14, ICEWS18, ICEWS05-15, ICEWS14*), but only achieve similar performance on GDELT compared with the original Re-Temp model (25.05%).Considering the history length influence shown in Figure 3, the model achieves similar results with different history lengths.Therefore, models with different history lengths on GDELT might be similar making the ensemble models less effective.However, ICEWS datasets are history-length sensitive, and ensemble models can benefit from different models of different history lengths.In addition to this, max pooling usually achieves the best performance as the ensemble method while min pooling will worsen the performance.

Conclusion
We introduced Re-Temp, which integrates both explicit and implicit temporal information and applies a relation-aware skip information flow to adopt after each timestamp to remove unnecessary information for prediction by taking the entity-related relation in the query into consideration.The experimental results on six TKGC datasets present the superiority of our model, compared with eight baseline models.We also conduct a statistical analysis of the datasets to show the different nature between WIKI and other datasets.It is hoped that Re-temp presents insight into the importance of the relation in the query and both types of temporal information.

Limitations
Re-Temp still follows Knowledge Graph Completion encoder-decoder framework (Hamilton et al., 2017) while more frameworks can be explored.The graph model at each timestamp and the decoder score function follow the same methods widely used by other models.
Since we have shown that the explicit temporal embedding and the skip information flow contribute to model performance, more work can be done by combining these concepts into the graph model and score function, for example, combining the entity-related relation into the graph model at each timestamp to selectively propagate between nodes, or combining the explicit temporal embedding into the decoder score function.Also, like most TKGC models, Re-Temp can not handle new entities that do not show in the training data.More methods integrating the text description can be explored (Lv et al., 2022).

A.1 CompGCN
In CompGCN, at each layer, edges(relations) are conducted as the transformation on the connected node(entity), and then a weighted sum calculation from GCN(Kipf and Welling, 2017) is applied to the transformed entity.Self-loop is also calculated before the activation function.Formally, for a entity node e q at timestamp t i at lth layer, the propagation shows as follows: g,0 f (h en,l t i , h r )+W l g,1 h eq,l t i ) (9) where N eq t i is the set of the neighbour entities of e q at timestamp t i , σ is the activation function and RReLU (Xu et al., 2015) is chosen.W l g,0 and W l g,1 are learnable parameters at layer l, and f is the composition function for neighbour entity embedding h en,l t i and relation embedding h r , such as summation, subtraction, element-wise product, or circularcorrelation (Xu et al., 2015).Summation is selected for better alignment of relation-aware skip information flow.

A.2 ConvTransE
By applying ConvTransE, the query subject entity embedding h eq tq and query relation embedding h rq are concatenated first, and then a convolutional layer and a feed-forward layer are applied.The score of each candidate is the dot-product of the candidate entity embedding with the representation after the ConvTransE.To denote the process of calculating the score of the candidate entity e c : s(e q , r q , e c , t q ) = h ec tq FC(Conv1d([h eq tq ⊕ h rq ])) (10) where e c is the candidate entity.

B Experiment Setup Details B.1 Running Details
All the models are trained by using 16 Intel(R) Core(TM) i9-9900X CPU @ 3.50GHz and NVIDIA Tesla P100 PCIe 16 GB.
The running time and number of parameters of Re-Temp on different datasets under the default hyperparameters can be found in Table 7.

B.2 Evaluation Metrics
For each query, the model produces a ranked list of all possible candidates and the reciprocal rank is the inverse of the rank position of the correct answer.MRR is calculated by rankq , which is the average reciprocal rank of all queries.Hits@N measures the proportion of results, where the correct answer is in the top N ranked results.N = 1, 3, 10 are chosen, as all previous works

Figure 1 :
Figure 1: A case study of temporal knowledge graph completion under the extrapolation setting

Figure 2 :
Figure 2: Illustration of Encoding and Decoding process in Re-Temp with history length as 3.For a query q, the input vector is h eq tq−3 .The encoder with relation-aware skip information flow learns the entity and relation representation h eq tq and h rq .Then the decoder measures the score of all the candidates.

Figure 3 :
Figure 3: MRR(%) change of Re-Temp with the history lengths.The x-axis is the history length and the y-axis is the MRR(%) change compared with history length 3.

Figure 4 :
Figure 4: Proportion(%) of quadruplets shown in exact one timestamp before for each dataset.The x-axis is the name of the dataset and the y-axis is the proportion(%).

Table 2 :
Statistics Details of Benchmark Dataset . The overall statistics of each dataset are presented in Table 2.All datasets are split into the Training, Validation and Test sets in chronological order.

Table 3 :
Performance(%) with Baseline models.The highest value is bold and the second highest is underlined.

Table 4 :
Cases from WIKI Dataset about Lionel Messi from Year 2003 to Year 2005.

Table 5 :
The MRR(%) result of the ablation test of Re-Temp.The highest value is bold.

Table 6 :
MRR(%) of our model with different ensemble methods.The highest value is bold.