Time-aware Graph Neural Networks for Entity Alignment between Temporal Knowledge Graphs

Entity alignment aims to identify equivalent entity pairs between different knowledge graphs (KGs). Recently, the availability of temporal KGs (TKGs) that contain time information created the need for reasoning over time in such TKGs. Existing embedding-based entity alignment approaches disregard time information that commonly exists in many large-scale KGs, leaving much room for improvement. In this paper, we focus on the task of aligning entity pairs between TKGs and propose a novel Time-aware Entity Alignment approach based on Graph Neural Networks (TEA-GNN). We embed entities, relations and timestamps of different KGs into a vector space and use GNNs to learn entity representations. To incorporate both relation and time information into the GNN structure of our model, we use a time-aware attention mechanism which assigns different weights to different nodes with orthogonal transformation matrices computed from embeddings of the relevant relations and timestamps in a neighborhood. Experimental results on multiple real-world TKG datasets show that our method significantly outperforms the state-of-the-art methods due to the inclusion of time information.


Introduction
Knowledge Graphs (KGs) provide a means for structured knowledge representation through connected nodes via edges. The nodes represent entities and the edges connecting these nodes denote relations. A KG stores facts as triples of the form (e s , r, e o ), where e s is the subject entity, e o is the object entity, and r is the relation between entities. Many large-scale KGs including YAGO (Suchanek et al., 2007) and DBpedia (Lehmann et al., 2015) have been established and are widely used in NLP applications, e.g., question answering and language modeling (Wang et al., 2017).
Since most KGs are developed independently and many of them are supplementary in contents, one of core challenges of KGs is to align equivalent entity pairs between different KGs. To address this issue, embedding-based approaches are leveraged to model entities and relations across multiple KGs and measure the similarities between entities (Sun et al., 2020). It has been proven that the utility of multi-relation information is helpful for an effective entity alignment approach (Mao et al., 2020a).
In addition to relation information, many KGs including YAGO3 (Mahdisoltani et al., 2013), Wikidata (Erxleben et al., 2014) and ICEWS (Lautenschlager et al., 2015) also contain time information between entities, i.e., some edges between entities have two properties, relation and time as shown in Figure 1. Facts in such temporal KGs (TKGs) can be represented as quadruples shaped like (e s , r, e o , τ ) where τ denotes the timestamp. Noteworthily, timestamps in most TKGs are presented in Arabic numerals and have similar formats. Thus, timestamps representing the same dates across multiple TKGs can be easily aligned by manually uniforming their formats.
However, the existing embedding-based entity alignment approaches disregard time information in TKGs, leaving much room for improve-ment. Taking the case in Figure 1 as an example, given two entities, George H. W. Bush and George Walker Bush, existing in two TKGs respectively, time-agnostic embedding-based approaches are likely to ignore time information and wrongly recognize these two entities as the same person in the real world due to the homogeneity of their neighborhood information.
To address this issue, an intuitive solution is to incorporate time information into entity alignment models. Inspired by the recent successful applications of GNN models in entity alignment, in this paper, we propose a novel Time-aware Entity Alignment approach based on Graph Neural Networks (TEA-GNN) for entity alignment between TKGs. Different from some temporal GNN models which discretize temporal graphs into multiple snapshots, we treat timestamps as properties of links between entities. We first map all entities, relations and timestamps in TKGs into an embedding space. To incorporate relation and time information into the GNN structure, we utilize a time-aware attention mechanism which assigns different importance weights to different nodes within a neighborhood according to orthogonal transformation matrices computed with the embedddings of the corresponding relations and timestamps. To further integrate time information into the final entity representations, we concatenate output features of entities with the summation of their neighboring time embeddings to get multi-view entity representations.
Specifically, we create a reverse relation r −1 for each relation r to integrate direction information. And a time-aware fact involving a time interval (e s , r, e o , [τ b , τ e ]), where τ b and τ e denote the begin and end time, is separated into two quadruples (e s , r, e o , τ b ) and (e o , r −1 , e s , τ e ), which represent the begin and the end of the relation, respectively. In this way, TEA-GNN can adapt well to datasets where timestamps are represented in various forms: time points, begin or end time, time intervals.
To verify our proposed approach, we evaluate TEA-GNN and its time-agnostic variant as well as several state-of-the-art entity alignment approaches on real-world datasets extracted from ICEWS, YAGO3 and Wikidata. Experimental results show that TEA-GNN significantly outperforms all baseline models with the inclusion of time information. To the best of our knowledge, this work is the first attempt to perform entity alignment between TKGs using a time-aware embedding-based approach.
2 Related Work 2.1 Knowledge Graph Embedding KG embedding (KGE) aims to embed entities and relations into a low-dimensional vector space and measure the plausibility of each triples (e s , r, e o ) by defining a score function. A typical KGE model is TransE (Bordes et al., 2013) which is based on the assumption of e s + r ≈ e o . In addition to translational KGE models including TransE and its variants (Wang et al., 2014;Nayyeri et al., 2021Nayyeri et al., , 2020, other KGE models can be classified into semantic matching models (Yang et al., 2015; or neural network-based models (Dettmers et al., 2018;Schlichtkrull et al., 2018).
With the development of TKGs, TKG embedding (TKGE) draws increasing attention (Leblay and Chekol, 2018;Xu et al., 2020cXu et al., ,a, 2021Lacroix et al., 2020). An example of typical TKGE models is TTransE (Leblay and Chekol, 2018) which represents timestamps as latent vectors with entities and relations and incorporates time embeddings into its score function ||e s + r + τ − e o ||. The success of TKGE models shows that the inclusion of time information is helpful for reasoning over TKGs.

Graph Neural Network
Benefitting from the ability to model non-Euclidean space, GNN has become increasingly popular in many areas, including social networks and KGs (Schlichtkrull et al., 2018). Graph Convolutional Network (GCN) (Kipf and Welling, 2017) is an extension of GNN, which generates nodelevel embeddings by aggregating information from the nodes' neighborhoods. Furthermore, Graph Attention Network (GAT) (Veličković et al., 2018) employs a self-attention mechanism to calculate the hidden representations of each entity by attending over its neighbors.
With the success of these GNN models in the static setting, we approach further practical scenarios where the graph temporally evolves. Existing approaches (Chen et al., 2018;Manessi et al., 2020;Pareja et al., 2020;Wu et al., 2020) generally discretize a temporal graph into multiple static snapshots in a timeline and utilize a combination of GNNs and recurrent architectures (e.g., LSTM), whereby the former digest graph information and the latter handle dynamism.

Knowledge Graph Alignment
Many KG alignment approaches are proposed to find equivalent entities across multiple KGs by measuring the similarities between entity embeddings. Most embedding-based entity alignment approaches can be classified into two categories, i.e., translational models and GNN-based models.
Typical translational entity align models are based on embeddings learned from TransE and its variants. MTransE (Chen et al., 2017) learns a mapping between two separate KGE spaces. JAPE  proposes to jointly learn structure embeddings and attribute embeddings in a uniform optimization objective. IPTransE (Zhu et al., 2017) and BootEA (Sun et al., 2018) employ a semisupervised learning strategy which iteratively label new entity alignment as supervision. The main limitation of translational models is their inability of modeling 1-n, n-1 and n-n relations. Besides, they may lack to exploit the global view of entities since TransE is trained on individual triples.
Many recent studies introduce GNNs into entity alignment task, which is originated with the ability to model global information of graphs. GCN-Align utilizes GCNs to embed entities of each KG into a unified vector space without the prior knowledge of relations. After that, a bunch of GCN-based approaches are proposed to incorporate relation information into GCNs. HGCN (Wu et al., 2019b) jointly learn both entity and relation representations via a GCN-based framework and RDGCN (Wu et al., 2019a) construct a dual relation graph for embedding learning. MuGNN (Cao et al., 2019), MRAEA and RREA (Mao et al., 2020a,b) assign different weight coefficients to entities according to relation types between them, which empowers the models to distinguish the importance between different entities. Our framework TEA-GNN adopts a similar idea with additional time information.

Problem Formulation
Formally, a TKG is represented as G = (E, R, T , Q) where E, R and T are the sets of entities, relations and timestamps, respectively. Q ⊂ E × R × E × T is the set of factual quadruples. Let G 1 = (E 1 , R 1 , T 1 , Q 1 ) and G 2 = (E 2 , R 2 , T 2 , Q 2 ) be two TKGs, and S = {(e i1 , e i2 )|e i1 ∈ E 1 , e i2 ∈ E 2 } be the set of pre-aligned entity pairs between G 1 and G 2 . As mentioned in Section 1, timestamps in different TKGs can be easily aligned by manually uniforming their formats. A uniform time set T * = T 1 ∪ T 2 can be constructed for both TKGs. Therefore, two TKGs can be renewed as G 1 = (E 1 , R 1 , T * , Q 1 ) and G 2 = (E 2 , R 2 , T * , Q 2 ) sharing the same set of timestamps. The task of timeaware entity alignment aims to find new aligned entity pairs between G 1 and G 2 based on the prior knowledge of S and T * .

The Proposed Approach
To exploit both relation and time information for entity alignment, we first create a reverse link for each link so that each pair of reverse links between entities can represent relation directions and handle the begin and end of the relation. An orthogonal transformation-based time-aware attention mechanism is employed in each GCN layer to assign different weights to entities according to relation and time information between them. Finally, entity alignments are predicted by applying a distance function to multi-view representations of entities.

Reverse Link Generation
Time information τ in a temporal fact (e s , r, e o , τ ) can be represented in various forms, e.g., time points, begin or end time and time intervals. A time interval is shaped like [τ b , τ e ] where τ b and τ e denote the actual begin time and end time of the fact, respectively. A time point can be represented as [τ b , τ e ] where τ b = τ e . Noteworthily, we represent a begin or end time as where τ 0 ∈ T * is the first time step in the time set denoting the unknown time information. A fact without known time information can be denoted as (e s , r, e o , [τ 0 , τ 0 ]) to deal with heterogeneous temporal knowledge bases where a significant amount of relations might be non-temporal.
In order to integrate relation direction, we create a reverse relation r −1 for each relation r and extend the relation set is decomposed into two quadruples (e s , r, e o , τ b ) and (e o , r −1 , e s , τ e ) to handle the begin and the end of the relation, respectively.

Time-Aware Attention Network
We map all of entities, relations (including reverse relations) and time steps in both TKGs into a same vector space R k where k denotes the embedding dimension. Embeddings of the entity e i , relation r j , time step τ m are denoted as h e i , h r j , h τm ∈ R k . Some recent studies (Smith et al., 2017;Pei et al., 2019) show that orthogonal transformation matrix is desirable and robust when transforming one isomorphic embedding to another. Thus, for each relation embedding h r j and time embedding h τm , we define the corresponding orthogonal transformation matrices M r j , M τm ∈ R k×k as follows, where embeddings h r j and h τm are normalized to ensure h T r j h r j = h T τm h τm = 1. By doing this, we can easily prove that transformation matrices M r j and M τm are orthogonal. Taking M τm as an example, we can obtain By using such orthogonal transformation matrices, the norms and the relative distances of entities can remain unchanged after transformation, i.e., A time-aware attention mechanism is used to integrate both time and relation information into entity representations by assigning different weights to different neighboring nodes according to the orthogonal transformation matrices of relations and timestamps of the corresponding inward links. In the case of Figure 2, the inward links in the neighborhood of the entity e 1 include (e 2 , r 1 , e 1 , τ 1b ) and (e 3 , r −1 2 , e 1 , τ 2e ) in which e 1 performs as the object entity. We define the time-specific weighted importance α i,j,m and the relation-specific weighted importance β i,j,m of the mth inward link from the neighboring entity e j to e i as follows, where || denotes the concatenation operator, ν T τ , ν T r ∈ R 3k are shared temporal and relational attention weight vectors. h in e i , h in e j ∈ R k are the input features of entities e i and e j . The entities' input features in the first network layer are their original embeddings. h τm and h rm are embeddings of the timestamp and relation in the mth inward link. Following GAT (Veličković et al., 2018), we define the normalized element ω i,j,m and υ i,j,m representing the temporal and relational connectivity from entity e i to e j using softmax functions, where N e i is the set of neighboring entities of e i and L r ij and L τ ij denote the sets of relations and time steps in the links from e j to e i .
The output features h out e i are obtained with an aggregate which linearly combines the temporal and relational orthogonal transformations of the input features of neighboring entities and a nonlinear ReLU activation function σ(·), i.e., where L ij denotes the set of links from e j to e i .

Entity Alignment Model
Entity align model aims to embed two KGs into a unified vector space by pushing the seed alignments of entities together. In this work, the entity align model consists of multiple TEA-GNN layers and a distance function which measures the similarities between final representations of entities. Let the the l-th layer's output features of entity e i as h out(l) e i . A cross-layer representation is employed to capture multi-hoop neighboring information in previous work (Mao et al., 2020a) by concatenating output features of different layers. In the same way, we define the global output featuresĥ out e i of e i aŝ where L is the number of layers and h are the input features. We further concatenate the average embeddings of connected timestamps with output features of entities to get multi-view embeddings as final entity representations, i.e., where N τ e i represents the set of timestamps around entity e i .
Entity alignments are predicted based on the distances between the final output features of entities from two KGs. For two entities e i ∈ E 1 and e j ∈ E 2 from different sources, we use L1 distance to measure the distance between them as follows, A margin rank loss is used as the optimization objective of the entity align model, i.e., where λ denotes the margin, S is the set of generated negative entity pairs, e i ∈ E 1 and e j ∈ E 2 are the negative entities of e i and e j . Negative entities are sampled randomly and an RMSprop optimizer is used to minimize the loss function.
During testing, we adopt CSLS (Conneau et al., 2018) as the distance metric to measure similarities between entity embeddings.
ICEWS05-15 1 is originally extracted from ICEWS (Lautenschlager et al., 2015) which is a repository that contains political events with specific time annotations, e.g. (Barack Obama, Make a visit, Ukraine, 2014-07-08). It is noteworthy that time annotations in ICEWS are all time points. ICEWS05-15 contains events during 2005 to 2015. We build two datasets DICEWS-1K and DICEWS-200 in the similar way to the construction of DFB datasets (Zhu et al., 2017). We first randomly divide ICEWS05-15 quadruples into two subsets Q 1 and Q 2 of similar size, and make the overlap ratio of the amount of shared quadruples between Q 1 and Q 2 to all quadruples equal to 50%. The only difference between DICEWS-1K and DICEWS-200 is the number of alignment seed S. In DICEWS-1K and DICEWS-200, i.e., 1,000 and 200 of entity pairs between TKGs are pre-known. The time unit of ICEWS datasets is 1 day, which means that each day is an individual time step.
YAGO3 and Wikidata are two common largescale knowledge bases containing time information of various forms including time points, beginning or end time, and time intervals. Lacroix et al. (2020)   attach complementary time information to meta YAGO facts. We build two time-aware datasets YAGO-WIKI50K-5K and YAGO-WIKI50K-1K by removing non-temporal facts in the generated TKGs and using different numbers of alignment seeds S. In addition, we build a hybrid dataset YAGO-WIKI20K containing both temporal and non-temporal facts with 400 pairs of alignment seeds by reducing sizes of entity sets of two TKGs to around 20,000. To generate the shared time set T * for a YAGO-WIKI dataset, we drop month and date information and use the first time step τ 0 to represent unobtainable time information. Statics of all datasets are listed in Table 1. P denotes the set of reference entity pairs. The set of reference entity pairs other than pre-aligned entity pairs, i.e., P − S are used for testing.

Experimental Setup
Following the previous work, we perform entity alignment as a ranking task based on similarities between entity embeddings, and use Mean Reciprocal Rank (MRR) and Hits@N (N=1, 10) as evaluation metrics. The default configuration of our model is as follows: embedding dimension k = 100, learning rate lr = 0.005, number of TEA-GNN layers L = 2, margin γ = 1 and dropout rate is 0.3. Below we only list the non-default hyperparamters: γ = 3 for DICEWS-200 and YAGO-WIKI20K; k = 25 for YAGO-WIKI50K-5K and YAGO-WIKI50K-1K. To verify the effectiveness of integration of time information, we implement a time-unaware variant of TEA-GNN which takes all time steps τ i ∈ T * as unknown time information τ 0 , denoted as TU-GNN. The non-default hyperparameters of TU-GNN are as follows: γ = 3 for DICEWS-1K and YAGO-WIKI20K; γ = 5 for DICEWS-200; k = 25 for YAGO-WIKI50K datasets. The reported performance is the average of five independent training runs.
In this work, we compare our proposed models with three strong translational baseline models and three state-of-the-art GNN-based models including MTransE (Chen et al., 2017), JAPE , AlignE (Sun et al., 2018), GCN-Align (Wang et al., 2018), MRAEA (Mao et al., 2020a) and RREA (Mao et al., 2020b). We choose AlignE instead of BootEA since we do not use iterative learning for other models including our proposed models. Due to the lack of attribute information, we use the SE (Structural Embedding) variants of JAPE and GCN-Align as baseline models. Except that the experiments of MTransE is implemented based on OpenEA framework (Sun et al., 2020), all experiments of baseline models are implemented based on their resource codes. All target models including our proposed models are trained on a GeForce GTX 1080Ti GPU. For a fair comparison, we set the maximum embedding dimension as 100 for all target models. Details of implementation and grid research for hyperparameters can be found in Appendix A.

Results and Analysis
Main Results Table 2 shows the entity alignment results of our proposed models and all baselines on   ICEWS and YAGO-WIKI50K datasets. It can be shown that TEA-GNN remarkably outperforms all baseline models on four TKG datasets across all metrics. Compared to RREA which achieves the best results among than all baseline models, TEA-GNN obtains the improvement of 22.9%, 32.9%, 6.2% and 3.9% regarding Hits@1 on four TKG datasets, respectively.

Qualitative Study
To study the effect of the integration of time information on the entity alignment performances of TEA-GNN, we conduct a qualitative study of TEA-GNN and its time-unaware variant TU-GNN. Table 3 lists several examples that TEA-GNN gives different predictions from TU-GNN with consideration of additional time information. In the first case, TU-GNN wrongly aligns two entities from G 1 and G 2 of DICEWS-200, i.e., Daniel Scioli and Agustín Rossi, because these two entities have very similar connected links in G 1 and G 2 regardless of time information. As shown in Table 3, some links respective to these two entities in G 1 and G 2 have the same linked entities and relation types, leading to the result that TU-GNN identifies them as an equivalent entity pair. On the other hand, TEA-GNN can correctly distinguish these two entities since the relevant links have different timestamps. Similarly, TU-GNN rec-ognizes a Wikidata entity Leon Benko (Q1389599) and a YAGO entity <Olivier_Fontenette> as the same person since these two person played for the same football club, while TEA-GNN can learn that they played for different periods and thus are not the same person in real world. These cases demonstrate the effect of time information on the performances of our proposed entity alignment models.  10.6% and 11.6% on YAGO-WIKI50K-5K and YAGO-WIKI50K-1K. It can be shown that the improvements on datasets with less alignment seeds are more significant. To futher verify this observation, we evaluate the performances of these two models and RREA on ICEWS and YAGO-WIKI50K datasets with different numbers of alignment seeds. As shown in Figure 3, the performance difference between TEA-GNN and two time-unaware models   becomes greater with the decreasing of the numbers |S| of alignment seeds from 1000 to 200. In practical applications, alignment seeds are difficult to obtain. Since our method performs well with a small amount of pre-aligned entity pairs, it can more easily be applied in large-scale KGs compared to time-unaware EA methods.
We also conduct a study on the prediction accuracy of aligned entities which have different time sensitivity. As mentioned in Section 5.1, we generate a hybrid dataset YAGO-WIKI20K where 17.5% of YAGO facts and 36.6% of Wikidata facts are non-temporal. We divide all testing entity pairs in this dataset into two categories based on their sensitivity to time information, i.e., highly timesensitive entity pairs and lowly time-sensitive entity pairs. Time sensitivity s i of a single entity e i is defined as the ratio of the number of its time-aware connected links in which τ = τ 0 over the total number of all links L i within its neighborhood, i.e., where L τ 0 i denotes the set of time-unaware links connecting e i . Given an entity pair (e i1 , e i2 ) between G 1 and G 2 , we call them as a highly timesensitive entity pair if s i1 0.5 and s i2 0.5. Otherwise, they are lowly time-sensitive.
Among 19,062 testing entity pairs of YAGO-WIKI20K, 6,898 of them are highly time-sensitive and others are lowly time-sensitive according to the above definitions. The entity alignment results of TEA-GNN and TU-GNN on the highly timesensitive test set and the lowly time-sensitive test set are reported in Table 4. It can be shown that TEA-GNN and TU-GNN have close performance on entity alignment for lowly time-sensitive entity pairs while TEA-GNN remarkably outperforms TU-GNN on the highly time-sensitive test set. In other words, the effect of incorporation of time information are more significant when testing entity pairs are more time-sensitive.
Complexity Study Given two TKGs to be aligned, i.e., G 1 = (E 1 , R 1 , T * , Q 1 ) and G 2 = (E 2 , R 2 , T * , Q 2 ), the total number of trainable parameters |p| of TEA-GNN is equal to, where L denotes the number of TEA-GNN layers, the last two terms represent numbers of parameters of shared temporal and relational attention weight vectors involved in Equation 4. Compared to parameter-efficient translational entity align models like MTransE, JAPE, in which the numbers of parameters are k×(|E 1 |+|E 2 |+|R 1 |+|R 2 |), TEA-GNN uses additional parameters only for reverse relation embeddings, time embeddings and attention weight vectors, which are much fewer than parameters of entity embeddings in most cases. As shown in Figure 4, the processing of the additional time information does not excessively increase the training time for TEA-GNN, compared to RREA and TU-GNN. Since we set the maximum number of epochs as 6000, the training processes of our proposed models on different datasets can be completed within a couple of hours on a single GeForce GTX 1080Ti GPU.

Conclusion
The main contributions of this paper are threefold: • We propose a novel GNN-based approach TEA-GNN which can model temporal relational graphs with an orthogonal transformation based time-aware attention mechanism and perform entity alignment tasks between TKGs. To the best of our knowledge, this work is the first attempt to integrate time information into an embeddingbased entity alignment approach.
• Existing temporal GNN models typically discretize a temporal graphs into multiple static snapshots and utilize a combination of GNNs and recurrent architectures. Differently, we treat timestamps as attentive properties of links between nodes. This method has been proven to be time-efficient in our case and could potentially be used for non-relational temporal graph representation learning.
• Multiple new datasets are created in this work for evaluating the performance of entity alignment models on TKGs. Experiments show that TEA-GNN remarkably outperforms the state-of-theart entity alignment models on various well-built TKG datasets.
For future work, we will try to integrate other types of information, e.g,. attribute information, into our model and extend our model for other learning tasks of temporal graphs.      Table 9: Hyperparameters of target models for YAGO-WIKI20K.