Inductively Representing Out-of-Knowledge-Graph Entities by Optimal Estimation Under Translational Assumptions

Conventional Knowledge Graph Completion (KGC) assumes that all test entities appear during training. However, in real-world scenarios, Knowledge Graphs (KG) evolve fast with out-of-knowledge-graph (OOKG) entities added frequently, and we need to represent these entities efficiently. Most existing Knowledge Graph Embedding (KGE) methods cannot represent OOKG entities without costly retraining on the whole KG. To enhance efficiency, we propose a simple and effective method that inductively represents OOKG entities by their optimal estimation under translational assumptions. Given pretrained embeddings of the in-knowledge-graph (IKG) entities, our method needs no additional learning. Experimental results show that our method outperforms the state-of-the-art methods with higher efficiency on two KGC tasks with OOKG entities.


Introduction
Knowledge Graphs (KG) play a pivotal role in various NLP tasks, but generally suffer from incompleteness. To address this problem, Knowledge Graph Completion (KGC) aims to predict missing relations in a KG based on Knowledge Graph Embeddings (KGE) of entities. Conventional KGE methods such as TransE  and RotatE (Sun et al., 2019) achieve success in conventional KGC, which assumes that all test entities appear during training. However, in real-world scenarios, KGs evolve fast with out-of-knowledgegraph (OOKG) entities added frequently. To represent OOKG entities, most conventional KGE methods need to retrain on the whole KG frequently, which is extremely time-consuming. Faced with this problem, we are in urgent need of an efficient method to tackle KGC with OOKG entities. Figure 1 shows an example of KGC with OOKG entities. Based on an existing KG, a new movie  When an OOKG entity "TENET" is added, we need to represent it efficiently via information of its IKG neighbors to predict its missing relations with other entities.
"TENET" is added as an OOKG entity with some auxiliary relations that connect it with some inknowledge-graph (IKG) entities. To predict the missing relations between "TENET" and other entities, we need to obtain its embedding first. Being aware that "TENET" is directed by "Christopher Nolan", is an "action" movie, and is starred by "John David Washington", we can combine these clues to profile "TENET" and estimate its embedding. This embedding can then be used to predict whether its relation with "English" is "language". To represent OOKG entities via IKG neighbor information instead of retraining, Hamaguchi et al. (2017); Wang et al. (2019); Bi et al. (2020);Zhao et al. (2020) adopt Graph Neural Networks (GNN) to aggregate IKG neighbors to obtain the OOKG entity embedding. Some other methods (Xie et al., 2016(Xie et al., , 2017Shi and Weninger, 2018) utilize external resources such as entity descriptions or images instead of IKG neighbor information to avoid retraining. However, GNN models require relatively complex calculations, and high-quality external resources are hard and expensive to acquire.
In this paper, we propose an inductive method that derives formulas to estimate OOKG entity embeddings from translational assumptions. Compared to existing methods, our method has simpler calculations and does not need external resources.
For a triplet (h, r, t), translational assumptions of KGE models suppose that embedding h can establish a connection with t via an r-specific operation. Assuming that h is OOKG and t is IKG, we show that if a translational assumption can derive a specific formula to compute h via pretrained t and r, then there will be no other candidate for h that better fits this translational assumption. Therefore, the computed h is the optimal estimation of the OOKG entity under this translational assumption. Among existing typical KGE models, we discover that translational assumptions of TransE and Ro-tatE can derive specific estimation formulas. Therefore, based on them, we design two instances of our method called InvTransE and InvRotatE, respectively. Note that our estimation formulas are settled, so our method needs no additional learning when given pretrained IKG embeddings.
Our contributions are summarized as follows: (1) We propose a simple and effective method to inductively represent OOKG entities by their optimal estimation under translational assumptions. (2) Our method needs no external resources. Given pretrained IKG embeddings, our method even needs no additional learning. (3) We evaluate our method on two KGC tasks with OOKG entities. Experimental results show that our method outperforms the state-of-the-art methods by a large margin with higher efficiency, and maintains a robust performance even under increasing OOKG entity ratios.

Notations and Problem Formulation
Let E denote the IKG entity set and R denote the relation set. K train is the training set where all entities are IKG. K aux is the auxiliary set connecting OOKG and IKG entities when inferring, where each triplet contains an OOKG and an IKG entity. We define the K-neighbor set of an entity e as all its neighbor entities and relations in K: N K (e) = {(r, t)|(e, r, t) ∈ K} ∪ {(h, r)|(h, r, e) ∈ K}.
Using notations above, we formulate our problem as follows: Given K aux and IKG embeddings pretrained on K train , we need to utilize them to represent an OOKG entity e ∈ E as an embedding. This embedding can then be used to tackle KGC with OOKG entities.

Proposed Method
As shown in Figure  aims to compute a set of candidate embeddings for an OOKG entity via its IKG neighbor information. The reducer aims to reduce these candidates to the final embedding of the OOKG entity.

Estimator
For an OOKG entity e, given its IKG neighbors N Kaux (e) with pretrained embeddings, the estimator aims to compute a set of candidate embeddings. Except TransE and RotatE, other typical KGE models have relatively complex calculations in their translational assumptions. These complex calculations prevent their translational assumptions from deriving specific estimation formulas for OOKG entities. 1 Therefore, we design two sets of estimation formulas based on TransE and RotatE, respectively. To be specific, if e is the head entity, we can obtain its optimal estimation e by the following formulas: where • denotes the element-wise product, r −1 denotes the element-wise inversion. Otherwise, if e is the tail entity, we can obtain its optimal estimation e by the following formulas:

Reducer
After the estimator computes |N Kaux (e)| candidate embeddings, the reducer aims to reduce them to the final embedding of the OOKG entity by weighted average. We design two weighting functions.
Correlation-based weights are query-aware. Inspired by Wang et al. (2019), we first use the conditional probability to model the correlation between two relations: .
1 Detailed proof is included in Appendix.
When the query relation r q is specified, we assign more weight to the candidate computed via a neighbor with a more relevant relation to r q : where Z corr is the normalization factor, r e is the neighbor relation via which e is computed.
Degree-based weights focus more on the entity with higher degree in the training set: where Z deg is the normalization factor, d e is the degree of the neighbor entity via which e is computed, δ is a smoothing factor. Based on these weighting functions, the final embedding of the OOKG entity e is computed by where C denotes the candidate embedding set.

Tasks and Datasets
We conduct experiments on two KGC tasks with OOKG entities: link prediction and triplet classification. For link prediction, we use two datasets released by Wang et al. (2019) built based on FB15k . For triplet classification, we use nine datasets released by Hamaguchi et al. (2017) built based on WN11 . All datasets are built for KGC with OOKG entities and composed of a training set, an auxiliary set, a validation set, and a test set. More details of these datasets are included in Appendix.

Experimental Settings
We tune hyper-parameters for pretraining on the validation set. Generally, we use Adam (Kingma and Ba, 2015) with an initial learning rate of 10 −3 as the optimizer and a batch size of 1, 024. For link prediction, we use an embedding dimension of 1, 000 and the correlation-based weights. For triplet classification, we use an embedding dimension of 300 and the degree-based weights. Details of experimental settings are included in Appendix.

Evaluation Metrics
For link prediction, we use Mean Reciprocal Rank (MRR) and the proportion of ground truth entities ranked in top-k (Hits@k, k ∈ {1, 10}). All the metrics are filtered versions that exclude false negative candidates. For triplet classification, we use Accuracy. We determine relation-specific thresholds δ r by maximizing the accuracy on the validation set.

Main Results
Evaluation results of link prediction are shown in  assumptions.
(2) GNN-LSTM performs the worst since neighbors are unordered but LSTM captures ordered information.
(3) LAN is the best baseline since it adopts a complex attention mechanism to aggregate neighbors more comprehensively. For triplet classification, due to space limitation, we show the main part of the results in Table 2 and the complete results in Appendix. From Table 2, we find that our method outperforms all baselines on all datasets due to our optimal estimation.

Analysis
How does the number of neighbors impact the performance? We randomly select up to k ∈ {1, 8, 32} IKG neighbors of OOKG entities to use. As shown in Table 3, as the number of used neighbors decreases, the performance drops. This suggests that using more neighbors can enhance the robustness and thus lead to better performance. Do our weighting functions matter? We attempt to reduce candidates with uniform weights. As shown in Table 3, the performance without our weighting functions drops dramatically. This verifies the effectiveness of our weighting functions. How does our method perform under increasing OOKG entity ratios? We compare the triplet classification results of InvTransE, LAN, and GNN-MEAN under increasing OOKG entity ratios in Figure 3. We find that, as the OOKG entity ratio increases, the performance of our method drops the slowest. This suggests that our method is more robust to increasing OOKG entity ratios.  train a model for triplet classification. This verifies that our simple method is much more efficient.

Related Work
Conventional transductive KGE methods map entities and relations to embeddings, and then use score functions to measure the salience of triplets. TransE  pioneers translational distance methods and is the most widely-used one.
To represent OOKG entities more efficiently, some inductive methods adopt GNN to aggregate IKG neighbors to inductively produce embeddings for OOKG entities (Hamaguchi et al., 2017;Wang et al., 2019;Bi et al., 2020;Zhao et al., 2020). These methods are effective, but need relatively complex calculations. Other inductive methods incorporate external resources to enrich embeddings and represent OOKG entities via only external resources (Xie et al., 2016;Shi and Weninger, 2018;Xie et al., 2017). However, high-quality external resources are hard and expensive to acquire.

Conclusion
This paper aims to address the problem of efficiently representing OOKG entities. We propose a simple and effective method that inductively represents OOKG entities by their optimal estimation under translational assumptions. Given pretrained IKG embeddings, our method needs no additional learning. Experimental results on two KGC tasks with OOKG entities show that our method outperforms the state-of-the-art methods by a large margin with higher efficiency, and maintains a robust performance under increasing OOKG entity ratios.

Appendices A Which Translational Assumptions Can Derive Specific Estimation Formulas for OOKG entities?
For a triplet (h, r, t), translational assumptions of KGE models suppose that h can establish a connection with t via an r-specific operation, which can be formulated by the following equation: where F r (·) is an r-specific function that is determined by the specific KGE model. Without loss of generality, we may assume that h is an OOKG entity and t is an IKG entity. Under a translational assumption, we can obtain a specific estimation formula for h if and only if (1) we regard h as unknown, and its solution in Equation 1 exists, (2) the solution is unique. If the above two conditions hold, the unique solution of h is the optimal estimation under the translational assumption, since no other candidate for h can better fit Equation 1. In the following parts, we analyze translational assumptions of four KGE models (TransE, RotatE, TransH, TransR) as examples.

A.1 TransE
For TransE, its translational assumption is formulated by In this case, we can obtain a unique solution of h by the following steps: This computed h is the optimal estimation under the translational assumption.

A.2 RotatE
For RotatE, its translational assumption is formulated by In this case, we can obtain a unique solution of h by the following steps: =⇒ h • r − t = 0, This computed h is the optimal estimation under the translational assumption.

A.3 TransH
For TransH, its translational assumption is formulated by where w r is the unit normal vector of the plane P that r lies on. From the translational assumption, we can derive the following equations: (h − w r hw r ) + r − (t − w r tw r ) h − w r hw r is the projection of h on the plane P . From the translational assumption, we can only deduce that the projection of h is equal to v. However, there exist infinitely many possible h that can satisfy this condition. Therefore, the solution of h is not unique, and we cannot obtain a specific estimation formula from the translational assumption of TransH.

A.4 TransR
For TransR, its translational assumption is formulated by where M r is an r-specific matrix. From the translational assumption, we can derive the following equations: In this case, we derive a system of linear equations from the translational assumption.  To pretrain the TransE and RotatE models, we adopt the self-adversarial negative sampling loss proposed by Sun et al. (2019) in consideration of its good performance on training TransE and RotatE. The self-adversarial negative sampling loss L is formulated as where σ is the sigmoid function, γ is the margin, n is the negative sampling size and (h i , r, t i ) is the i-th negative sample triplet. D (·) is the distance function. D (h, r, t) is equal to h + r − t 1/2 for TransE and is equal to h • r − t 1/2 for RotatE. p is the self-adversarial weight function which gives more weight to the high-scored negative samples: where α is a hyper-parameter called sampling temperature to be tuned. F(·) is the score function that is equal to −D(·).
We conduct each experiment on a single Nvidia Geforce GTX-1080Ti GPU and tune hyper-parameters on the validation set. Generally, we set the batch size to 1024 and use Adam (Kingma and Ba, 2015) with an initial learning rate of 10 −3 as the optimizer. We choose the correlation-based weights for link prediction and choose the degree-based weights with a smoothing factor of 0.1 for triplet classification. Other hyper-parameters are shown in Table 5 Table 6: Complete evaluation results (Accuracy) of triplet classification. Bold is the best. Underline is the second best. The results of all five baselines are taken from their original papers.