A survey of embedding models of entities and relationships for knowledge graph completion

Knowledge graphs (KGs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge graphs are typically incomplete, it is useful to perform knowledge graph completion or link prediction, i.e. predict whether a relationship not in the knowledge graph is likely to be true. This paper serves as a comprehensive survey of embedding models of entities and relationships for knowledge graph completion, summarizing up-to-date experimental results on standard benchmark datasets and pointing out potential future research directions.


Introduction
Let us revisit the classic Word2Vec example of a "royal" relationship between "king" and "man", and between "queen" and "woman". As illustrated in this example: v king − v man ≈ v queen − v woman , word vectors learned from a large corpus can model relational similarities or linguistic regularities between pairs of words as translations in the projected vector space (Mikolov et al., 2013;Pennington et al., 2014). Figure 1 shows another example of a relational similarity between word pairs of countries and capital cities: Assume that we consider the country and capital pairs in Figure 1 to be pairs of entities rather than word types. That is, we now represent country and capital entities by low-dimensional and dense vectors. The relational similarity between word pairs is presumably to capture a "is capital of" relationship between country and capital entities. Also, we represent this relationship by a translation vector v is capital of in the entity vector space. Thus, we expect: This intuition inspired the TransE model-a well-known embedding model for KG completion or link prediction in KGs (Bordes et al., 2013).
Knowledge graphs are collections of real-world triples, where each triple or fact (h, r, t) in KGs represents some relation r between a head entity h and a tail entity t. KGs can thus be formalized as directed multi-relational graphs, where nodes correspond to entities and edges linking the nodes encode various kinds of relationships (García-Durán et al., 2016;Nickel et al., 2016a). Here entities are real-world things or objects such as persons, places, organizations, music tracks or movies. Each relation type defines a certain relationship between entities. For example, as illustrated in Figure 2, the relation type "child of" relates person entities with each other, while the relation type "born in" relates person entities Figure 2: An illustration of (incomplete) knowledge base, with 4 person entities, 2 place entities, 2 relation types and total 6 triple facts. This figure is drawn based on Weston and Bordes (2014).
A main issue is that even very large KGs, such as Freebase and DBpedia, which contain billions of fact triples about the world, are still far from complete. In particular, in English DBpedia 2014, 60% of person entities miss a place of birth and 58% of the scientists do not have a fact about what they are known for (Krompaß et al., 2015). In Freebase, 71% of 3 million person entities miss a place of birth, 75% do not have a nationality while 94% have no facts about their parents (West et al., 2014). So, in terms of a specific application, question answering systems based on incomplete KGs would not provide a correct answer given a correctly interpreted question. For example, given the incomplete KG in Figure 2, it would be impossible to answer the question "where was Jane born ?", although the question is completely matched with existing entity and relation type information (i.e. "Jane" and "born in") in KG. Consequently, much work has been devoted towards knowledge graph completion to perform link prediction in KGs, which attempts to predict whether a relationship/triple not in the KG is likely to be true, i.e. to add new triples by leveraging existing triples in the KG (Lao and Cohen, 2010;Gardner et al., 2014;García-Durán et al., 2016). For example, we would like to predict the missing tail entity in the incomplete triple (Jane, born in, ?) or predict whether the triple (Jane, born in, Miami) is correct or not.
Embedding models for KG completion have been proven to give state-of-the-art link prediction performances, in which entities are represented by latent feature vectors while relation types are represented by latent feature vectors and/or matrices and/or third-order tensors (Bordes et al., 2013;Socher et al., 2013). This paper: (1) surveys the embedding models for KG completion, then (2) summarizes up-todate experimental results on the standard evaluation task of entity prediction-which is also referred to as the link prediction task (Bordes et al., 2013), and (3) points out potential future research directions.

A General Approach of Embedding Models for KG Completion
Let E denote the set of entities and R the set of relation types. Denote by G the knowledge graph consisting of a set of correct triples (h, r, t), such that h, t ∈ E and r ∈ R. For each triple (h, r, t), the embedding models define a score function f (h, r, t) of its plausibility. Their goal here is to: Choose f such that the score f (h, r, t) of a correct triple (h, r, t) is higher than the score f (h , r , t ) of an incorrect triple (h , r , t ).
For example, TransE defines a score function of f TransE (h, r, t) = − v h + v r − v t , where h, r and t are represented by low dimensional vectors v h , v r and v t , respectively. As (Tokyo, is capital of, Japan) is a correct triple, while (Tokyo, is capital of, Portugal) and (Lisbon, is capital of, Japan) are incorrect ones, we would have: Table 1 in Section 3 summarizes different prominent score functions f (h, r, t).
To learn model parameters (i.e. entity vectors, relation vectors or matrices), the embedding models minimize an objective loss L. A conventional objective loss is the margin-based pairwise ranking loss (Bordes et al., 2013): where [x] + = max(0, x); γ is the margin hyper-parameter; and G (h,r,t) is the set of incorrect triples generated by corrupting the correct triple (h, r, t) ∈ G.
Also, the negative log-likelihood (NLL) of softmax regression (Toutanova and Chen, 2015) and the NLL of logistic regression (Trouillon et al., 2016) are commonly used in recent KG completion research: 1 with: I (h,r,t) = 1 for (h, r, t) ∈ G −1 for (h, r, t) ∈ G To corrupt the head or tail entities, a common strategy is to uniformly replace the entities when sampling incorrect triples (Bordes et al., 2013), however it results in many false negative labels (Wang et al., 2014). Domain sampling (Krompaß et al., 2015;Xie et al., 2017) generates corrupted triples by sampling entities from the same domain or from the set of relation-dependent entities. The "Bernoulli" trick (Wang et al., 2014) is widely used to set different probabilities for generating head or tail entities: For each relation type r, we calculate the averaged number a r,1 of heads h for a pair (r, t) and the averaged number a r,2 of tails t for a pair (h, r). We then define a Bernoulli distribution with success probability λ r = a r,1 a r,1 + a r,2 for sampling: given a correct triple (h, r, t), we corrupt this triple by replacing head entity with probability λ r while replacing the tail entity with probability (1 − λ r ).
Recently, Cai and Wang (2018) and Sun et al. (2019) proposed adversarial learning-based strategies for sampling incorrect triples. However, they did not provide a comparison between the adversarial learning-based strategies and the "Bernoulli" trick.

Triple-based Embedding Models
Translation-based models: The Unstructured model  assumes that the head and tail entity vectors are similar. As the Unstructured model does not take the relationship into account, it cannot distinguish different relation types. The Structured Embedding (SE) model (Bordes et al., 2011) assumes that the head and tail entities are similar only in a relation-dependent subspace, where each

and • denote Hamilton and quaternion inner products, respectively
Path Table 1: The score functions f (h, r, t) of several prominent embedding models for KG completion. In these models, the entities h and t are represented by vectors v h and v t ∈ R k , respectively. 1/2 denotes either the L 1 -norm or the squared L 2 -norm. In ConvE, v h and v r denote a 2D reshaping of v h and v r , respectively. In both ConvE and ConvKB models, * and Ω denote a convolution operator and a set of filters, respectively. relation is represented by two different matrices. TransE (Bordes et al., 2013) is inspired by models such as the Word2Vec Skip-gram model (Mikolov et al., 2013) where relationships between words often correspond to translations in latent feature space. In particular, TransE learns low-dimensional and dense vectors for every entity and relation type, so that each relation type corresponds to a translation vector operating on the vectors representing the entities, i.e. v h + v r ≈ v t for each fact triple (h, r, t). TransE thus is suitable for 1-to-1 relationships, such as "is capital of", where a head entity is linked to at most one tail entity given a relation type. Because of using only one translation vector to represent each relation type, TransE is not well-suited for Many-to-1, 1-to-Many and Many-to-Many relationships, 2 such as for relation types "born in", "place of birth" and "research fields." For example in Figure 2, using one vector representing the relation type "born in" cannot capture both the translating direction from "Patti" to "Miami" and its inverse direction from "Mom" to "Austin." To overcome those issues of TransE, TransH (Wang et al., 2014) associates each relation with a relation-specific hyperplane and uses a projection vector to project entity vectors onto that hyperplane. TransD  and TransR/CTransR (Lin et al., 2015b) extend TransH by using two projection vectors and a matrix to project entity vectors into a relation-specific space, respectively. Similar to TransR, TransR-FT (Feng et al., 2016a) also uses a matrix to project head and tail entity vectors. TEKE H  extends TransH to incorporate rich context information in an external text corpus. lppTransD (Yoon et al., 2016) extends TransD to additionally use two projection vectors for representing each relation. STransE (Nguyen et al., 2016b) and TranSparse (Ji et al., 2016) can be viewed as direct extensions of TransR, where head and tail entities are associated with their own projection matrices. Unlike STransE, TranSparse uses adaptive sparse matrices, whose sparse degrees are defined based on the number of entities linked by relations. TranSparse-DT  is an extension of TranSparse with a dynamic translation. ITransF (Xie et al., 2017) can be considered as a generalization of STransE, which allows the sharing of statistic regularities between relation projection matrices and alleviates data sparsity issue. Furthermore, TorusE (Ebisu and Ichise, 2018) embeds entities and relations on a torus to handle TransE's regularization problem which forces entity embeddings to be on a sphere in the embedding vector space.
Bilinear-& Tensor-based models: DISTMULT (Yang et al., 2015) is based on the Bilinear model (Nickel et al., 2011;Jenatton et al., 2012) where each relation is represented by a diagonal matrix rather than a full matrix. SimplE (Kazemi and Poole, 2018) extends DISTMULT to allow two embeddings of each entity to be learned dependently. Such quadratic forms are also used to model entities and relations in KG2E , TATEC (García-Durán et al., 2016), TransG (Xiao et al., 2016), RSTE (Tay et al., 2017), ANALOGY  and Dihedral (Xu and Li, 2019). SME-bilinear  is proposed to first separately combine entity-relation pairs (h, r) and (r, t) and then semantically match these combinations, using tensor product. HolE (Nickel et al., 2016b) uses circular correlationa compositional operator-which can be interpreted as a compression of the tensor product. In addition, TuckER (Balazevic et al., 2019) is a linear model based on the Tucker tensor decomposition of the binary tensor representation of KG triples.
Neural network-based models: The neural tensor network (NTN) model (Socher et al., 2013) also uses a bilinear tensor operator to represent each relation while ProjE (Shi and Weninger, 2017) can be viewed as simplified versions of NTN. The ER-MLP model (Dong et al., 2014) represents each triple by a vector obtained from concatenating head, relation and tail embeddings, then feeds this vector into a single-layer MLP with one-node output layer. ConvE (Dettmers et al., 2018) and ConvKB (Nguyen et al., 2018) are based on convolutional neural networks. ConvE uses a convolution layer directly over 2D reshaping of head-entity and relation embeddings, while ConvKB applies a convolution layer over the embedding triples (here each triple (h, r, t) is represented as a 3-column matrix where each column vector represents a triple element). HypER (Balažević et al., 2019) simplifies ConvE by using a hypernetwork to produce 1D convolutional filters for each relation, then extracts relation-specific features from head entity embeddings. Conv-TransE (Shang et al., 2019) extends ConvE to keep the translational characteristic between entities and relations. InteractE (Vashishth et al., 2020)  Complex vector-based models: Instead of embedding entities and relations in the real-valued vector space, ComplEx (Trouillon et al., 2016) is an extension of DISTMULT in the complex vector space. ComplEx-N3 (Lacroix et al., 2018) extends ComplEx with weighted nuclear 3-norm. Also in the complex vector space, RotatE (Sun et al., 2019) defines each relation as a rotation from the head entity to the tail entity. QuatE (Zhang et al., 2019) represents entities by quaternion embeddings (i.e. hypercomplexvalued embeddings) and models relations as rotations in the quaternion space by employing the Hamilton and quaternion-inner products.

Relation Path-based Embedding Models
All embedding models mentioned above in Section 3.1 only take triples into account. Thus, these models ignore potentially useful information implicitly presented by the structure of the KG. t should indicate a relationship "nationality" between the h and t entities. Also, neighborhood information of entities could be useful for predicting the relationship between two entities as well. For example, in the KG NELL (Carlson et al., 2010), we have information such as if a person works for an organization and this person also leads that organization, then it is likely that this person is the CEO of that organization.
Recent research has also shown that relation paths between entities in KGs provide richer context information and improve the performance of embedding models for KG completion (Luo et al., 2015;Liang and Forbus, 2015;García-Durán et al., 2015;Guu et al., 2015;Toutanova et al., 2016;Durán and Niepert, 2018;Takahashi et al., 2018;Chen et al., 2018). In particular, Luo et al. (2015) constructed relation paths between entities and, viewing entities and relations in the path as pseudo-words, then applied Word2Vec (Mikolov et al., 2013) to produce pre-trained vectors for these pseudo-words. Luo et al. (2015) showed that using these pre-trained vectors for initialization helps to improve the performance of models TransE (Bordes et al., 2013), SME  and SE (Bordes et al., 2011). Liang and Forbus (2015) used the plausibility score produced by SME to compute the weights of relation paths.
PTransE-RNN (Lin et al., 2015a) models relation paths by using a recurrent neural network (RNN). In addition, Das et al. (2017)'s model and ROPs (Yin et al., 2018) also apply RNN to model the path between an entity pair, however, in contrast to PTransE-RNN, they additionally take the intermediate entities present in the path into account. IRN (Shen et al., 2017) uses a shared memory and RNN-based controller to implicitly model multi-step structured relationships. RTransE (García-Durán et al., 2015), PTransE-ADD (Lin et al., 2015a) and TransE-COMP (Guu et al., 2015) extend TransE to represent a relation path by a vector which is the sum of the vectors of all relations in the path. In Bilinear-COMP (Guu et al., 2015) and PRUNED-PATHS (Toutanova et al., 2016), each relation is a matrix and so it represents the relation path by matrix multiplication. Durán and Niepert (2018) proposed the KB LRN framework to combine relational paths with latent and numerical features.
The neighborhood mixture model TransE-NMM (Nguyen et al., 2016a) can be also viewed as a threerelation path model because it takes into account the neighborhood entity and relation information of both head and tail entities in each triple. ReInceptionE (Xie et al., 2020) employs the Inception network (Szegedy et al., 2016) to increase the interactions between head and relation embeddings for obtaining better representations of the head and relation pairs and then uses a relation-aware attention mechanism to enrich these pair representations with the local neighborhood and global entity information. Neighborhood information is also exploited in R- GCN (Schlichtkrull et al., 2018), SACN (Shang et al., 2019) and KBGAT (Nathani et al., 2019), which generalize graph convolutional networks (Kipf and Welling, 2017) and graph attention networks (Velikovi et al., 2018) for dealing with highly multi-relational data, e.g. KGs. For computing the final representation of an entity, they make use of layer-wise propagation to accumulate linearly-transformed embeddings of its neighboring entities through a normalized sum with different relational weights. For link prediction, R-GCN, SACN and KBGAT apply DISTMULT, Conv-TransE and ConvKB to compute triple scores, respectively.

Other KG Completion Models
The Path Ranking Algorithm (PRA) (Lao and Cohen, 2010) is a random walk inference technique which was proposed to predict a new relationship between two entities in KGs. Lao et al. (2011) used PRA to estimate the probability of an unseen triple as a combination of weighted random walks that follow different paths linking the head entity and tail entity in the KG. Gardner et al. (2014) made use of an external text corpus to increase the connectivity of the KG used as the input to PRA. Gardner and Mitchell (2015) improved PRA by proposing a subgraph feature extraction technique to make the generation of random walks in KGs more efficient and expressive, while  extended PRA to couple the path ranking of multiple relations. PRA can also be used in conjunction with first-order logic in the discriminative Gaifman model (Niepert, 2016). In addition, Neelakantan et al. (2015) used a RNN to learn vector representations of PRA-style relation paths between entities in the KG. Other random-walk based learning algorithms for KG completion can be also found in Feng et al. (2016b), , , Mazumder and Liu (2017) and Das et al. (2018).  proposed a Neural Logic Programming (LP) framework to learning probabilistic first-order logical rules for KG reasoning, producing competitive link prediction performances. Feldman

Evaluation Task
The standard evaluation task of entity prediction, i.e. the link prediction task (Bordes et al., 2013), is proposed to evaluate embedding models for KG completion. 3 Datasets: Information about benchmark datasets for KG completion evaluation is given in Table 2. FB15k and WN18 are derived from the large real-world KG Freebase (Bollacker et al., 2008) and the large lexical KG WordNet (Miller, 1995), respectively. Toutanova and Chen (2015) noted that FB15k and WN18 are not challenging datasets because they contain many reversible triples. Dettmers et al. (2018) showed a concrete example: A test triple (feline, hyponym, cat) can be mapped to a training triple (cat, hypernym, feline), thus knowing that "hyponym" and "hypernym" are reversible allows us to easily predict the majority of test triples. So, datasets FB15k-237 (Toutanova and Chen, 2015) and WN18RR (Dettmers et al., 2018) are created to serve as realistic KG completion datasets which represent a more challenging learning setting. FB15k-237 and WN18RR are subsets of FB15k and WN18, respectively.

Task Description
The entity prediction task, i.e. link prediction (Bordes et al., 2013), predicts the head or the tail entity given the relation type and the other entity, i.e. predicting h given (?, r, t) or predicting t given (h, r, ?) where ? denotes the missing element. The results are evaluated using a ranking induced by the function f (h, r, t) on test triples. Each correct test triple (h, r, t) is corrupted by replacing either its head or tail entity by each of the possible entities in turn, and then these candidates are ranked in descending order of their plausibility score. The "Filtered" setting protocol, described in Bordes et al. (2013), filters out before ranking any corrupted triples that appear in the KG. Ranking a corrupted triple appearing in the KG (i.e. a correct triple) higher than the original test triple is also correct, thus this "Filtered" setting provides a clear view on the ranking performance.
In addition to the mean rank and the Hits@10 (i.e. the proportion of test triples for which the target entity is ranked in the top 10 predictions), which were originally used in the entity prediction task (Bordes et al., 2013), recent work also reports the mean reciprocal rank (MRR). 4 Mean rank is always greater or equal to 1 and the lower mean rank indicates better entity prediction performance, while MRR and Hits@10 scores always range from 0.0 to 1.0, and higher score reflects better prediction result.

Discussion and Conclusion
The reasons why much work has been devoted towards developing triple-based models are: (1) additional information sources might not be available, e.g., for KGs for specialized domains, (2) models that do not exploit path information or external resources are simpler and thus typically much faster to train than the more complex models using path or external information, and (3) the more complex models that exploit path or external information are typically extensions of these simpler models, and are often initialized with parameters estimated by such simpler models, so improvements to the simpler models should yield corresponding improvements to the more complex models as well (Nguyen et al., 2016b). It is worth to further explore those KG completion embedding models for a new application where we could formulate its corresponding data into triples. For example, in Web search engines, we observe useroriented relationships between submitted queries and documents returned by the search engines. That is, we have triple representations (query, user, document) in which for each user-oriented relationship, we would have many queries and documents, resulting in a lot of Many-to-Many relationships. Inspired by this observation, Vu et al. (2017) applied STransE (Nguyen et al., 2016b for search personalization to re-rank the search documents returned by a search engine for users' submitted queries. Other application examples can be also found for recommender systems (Zhang et al., 2016;He et al., 2017;Cao et al., 2019), social relation extraction (Tu et al., 2017) and visual relation detection .
Future research directions might also include: (i) Combining logical rules which contain rich background information and KG triples in a unified KG completion framework, e.g. jointly embedding KGs and logical rules (Guo et al., 2016;. (ii) Recent embedding models for KG completion hold a closed-world assumption where the KGs are fixed (i.e. new entities might not be added easily), therefore it would be worth exploring open-world KG completion models to connect unseen entities to the existing KGs (Shi and Weninger, 2018). (iii) Investigating efficient approaches which can be applied to large-scale KGs of millions of entities and relations (Zhang et al., 2020).
In this paper, we have presented a comprehensive survey of embedding models of entity and relationships for knowledge graph completion. This paper also provides update-to-date experimental results of the embedding models for the entity prediction (i.e. link prediction) task on benchmark datasets FB15k, WN18, FB15k-237 and WN18RR. We hope that this paper serves its purpose by providing a concrete foundation for future research and applications on the topic.