PairRE: Knowledge Graph Embeddings via Paired Relation Vectors

Distance based knowledge graph embedding methods show promising results on link prediction task, on which two topics have been widely studied: one is the ability to handle complex relations, such as N-to-1, 1-to-N and N-to-N, the other is to encode various relation patterns, such as symmetry/antisymmetry. However, the existing methods fail to solve these two problems at the same time, which leads to unsatisfactory results. To mitigate this problem, we propose PairRE, a model with paired vectors for each relation representation. The paired vectors enable an adaptive adjustment of the margin in loss function to fit for different complex relations. Besides, PairRE is capable of encoding three important relation patterns, symmetry/antisymmetry, inverse and composition. Given simple constraints on relation representations, PairRE can encode subrelation further. Experiments on link prediction benchmarks demonstrate the proposed key capabilities of PairRE. Moreover, We set a new state-of-the-art on two knowledge graph datasets of the challenging Open Graph Benchmark.


Introduction
Knowledge graphs store huge amounts of structured data in the form of triples, with projects such as WordNet (Miller, 1995), Freebase (Bollacker et al., 2008), YAGO (Suchanek et al., 2007) and DBpedia (Lehmann et al., 2015). They have gained widespread attraction from their successful use in tasks such as question answering (Bordes et al., 2014), semantic parsing (Berant et al., 2013), and named entity disambiguation (Zheng et al., 2012) and so on.
Since most knowledge graphs suffer from incompleteness, predicting missing links between entities has been a fundamental problem. This problem is named as link prediction or knowledge graph completion. Knowledge graph embedding methods, which embed all entities and relations into a low dimensional space, have been proposed for this problem.
Distance based embedding methods from TransE (Bordes et al., 2013) to the recent state-of-the-art RotatE (Sun et al., 2019) have shown substantial improvements on knowledge graph completion task. Two major problems have been widely studied. The first one refers to handling of 1-to-N, N-to-1, and N-to-N complex relations (Bordes et al., 2013;Lin et al., 2015). In case of the 1-to-N relations, given triples like (StevenSpielberg, DirectorOf , ?), distance based models should make all the corresponding entities about film name like Jaws and JurassicP ark have closer distance to entity StevenSpielberg after transformation via relation DirectorOf . The difficulty is that all these entities should have different representations. Same issue happens in cases of N-to-N and N-to-1 relations. The latter is learning and inferring relation patterns according to observed triples, as the success of knowledge graph completion heavily relies on this ability (Bordes et al., 2013;Sun et al., 2019). There are various types of relation patterns: symmetry (e.g., IsSimilarT o), antisymmetry (e.g., F atherOf ), inverse (e.g., P eopleBornHere and P laceOf Birth), composition (e.g., my mother's father is my grandpa) and so on.
Previous methods solve these two problems separately. TransH (Wang et al., 2014), TransR (Lin et al., 2015), TransD  all focus on ways to solve complex relations. However, these methods can only encode symmetry/antisymmetry relations. The recent state-ofthe-art RotatE shows promising results to encode symmetry/antisymmetry, inverse and composition relations. However, complex relations remain challenging to predict.
Here we present PairRE, an embedding method that is capable of encoding complex relations and multiple relation patterns simultaneously. The proposed model uses two vectors for relation representation. These vectors project the corresponding head and tail entities to Euclidean space, where the distance between the projected vectors is minimized. This provides three important benefits: • The paired relation representations enable an adaptive adjustment of the margin in loss function to fit for different complex relations; • Semantic connection among relation vectors can be well captured, which enables the model to encode three important relation patterns, symmetry/antisymmetry, inverse and composition; • Adding simple constraints on relation representations, PairRE can encode subrelation further.
Besides, PairRE is a highly efficient model, which contributes to large scale datasets. We evaluate PairRE on six standard knowledge graph benchmarks. The experiment results show PairRE can achieve either state-of-the-art or highly competitive performance. Further analysis also proves that PairRE can better handle complex relations and encode symmetry/antisymmetry, inverse, composition and subrelation relations.

Background and Notation
Given a knowledge graph that is represented as a list of fact triples, knowledge graph embedding methods define scoring function to measure the plausibility of these triples. We denote a triple by (h, r, t), where h represents head entity, r represents relation and t represents tail entity. The column vectors of entities and relations are represented by bold lower case letters, which belong to set E and R respectively. We denote the set of all triples that are true in a world as T . f r (h, t) represents the scoring function.
We take the definition of complex relations from (Wang et al., 2014). For each relation r, we compute average number of tails per head (tphr) and average number of heads per tail (hptr). If tphr < 1.5 and hptr < 1.5, r is treated as 1-to-1; if tphr > 1.5 and hptr > 1.5, r is treated as a N-to-N; if tphr > 1.5 and hptr < 1.5, r is treated as 1-to-N.

Related Work
Distance based models. Distance based models measure plausibility of fact triples as distance between entities. TransE interprets relation as a translation vector r so that entities can be connected, i.e., h + r ≈ t. TransE is efficient, though cannot model symmetry relations and have difficulty in modeling complex relations. Several models are proposed for improving TransE to deal with complex relations, including TransH, TransR, TransD, TranSparse (Ji et al., 2016) and so on. All these methods project the entities to relation specific hyperplanes or spaces first, then translate projected entities with relation vectors. By projecting entities to different spaces or hyperplanes, the ability to handle complex relations is improved. However, with the added projection parameters, these models are unable to encode inverse and composition relations.
The recent state-of-the-art, RotatE, which can encode symmetry/antisymmetry, inverse and composition relation patterns, utilizes rotation based translational method in a complex space. Although expressiveness for different relation patterns, complex relations remain challenging. GC-OTE (Tang et al., 2020) proposes to improve complex relation modeling ability of RotatE by introducing graph context to entity embedding. However, the calculation of graph contexts for head and tail entities is time consuming, which is inefficient for large scale knowledge graphs, e.g. ogbl-wikikg (Hu et al., 2020).
Another related work is SE (Bordes et al., 2011), which utilizes two separate relation matrices to project head and tail entities. As pointed out by (Sun et al., 2019), this model is not able to encode symmetry/antisymmetry, inverse and composition relations. Table 1 shows comparison between our method and some representative distance based methods. As the table shows, our model is the most expressive one, with the ability to handle complex relations and encode four key relation patterns.
Semantic matching models. Semantic matching models exploit similarity-based scoring functions, which can be divided into bilinear models and neural network based models. As the models have been developed, such as RESCAL (Nickel et al., 2011), DistMult (Yang et al., 2014), HolE (Nickel et al., 2016), ComplEx (Trouillon et al., 2016) and QuatE (Zhang et al., 2019), the key relation encoding abilities are enriched. However, all these models have the flaw in encoding composition relations (Sun et al., 2019).
RESCAL, ComplEx and SimplE (Kazemi and Poole, 2018) are all proved to be fully expressive when embedding dimensions fulfill some requirements Trouillon et al., 2016;Kazemi and Poole, 2018). The fully expressiveness means these models can express all the ground truth which exists in the data, including complex relations. However, these requirements are hardly fulfilled in practical use. It is proved by ) that, to achieve complete expressiveness, the embedding dimension should be greater than N /32, where N is the number of entities in dataset.
Neural networks based methods, e.g., convolution neural networks (Dettmers et al., 2018), graph convolutional networks (Schlichtkrull et al., 2018) show promising performances. However, they are difficult to analyze as they work as a black box.
Encoding Subrelation. Existing methods encode subrelation by utilizing first order logic rules. One way is to augment knowledge graphs with grounding of rules, including subrelation rules Qu and Tang, 2019). The other way is adding constraints on entity and relation representations, e.g., ComplEx-NNE-AER and SimplE + . The second way enriches the models' expressiveness with relatively low cost. In this paper, we show that PairRE can encode subrelation with constraints on relation representations while keeping the ability to encode symmetry/antisymmetry, inverse and composition relations.

Methodology
To overcome the problem of modeling 1-to-N/Nto-1/N-to-N complex relations and enrich the capabilities for different relation patterns, we propose a model with paired vectors for each relation. Given a training triple (h, r, t), our model learns vector embeddings of entities and relation in real space. Specially, PairRE takes relation embedding as paired vectors, which is represented as [r H , r T ]. r H and r T project head entity h and tail entity t to Euclidean space respectively. The projection operation is the Hadamard product 1 between these two vectors. PairRE then computes distance of the two projected vectors as plausibility of the triple . We want that h • r H ≈ t • r T when (h, r, t) holds, while h • r H should be far away from t • r T otherwise. In this paper, we take the L 1 -norm to measure the distance.
In order to remove scaling freedoms, we also add constraint on embeddings similar to previous distance based models (Bordes et al., 2013;Wang et al., 2014;Lin et al., 2015). And the constraint is only added on entity embeddings. We want relation embeddings to capture semantic connection among relation vectors (e.g., P eopleBornHere and P laceOf Birth) and complex characteristic (e.g., 1-N) easily and sufficiently. For entity embedding, the L 2 -norm is set to be 1.
The model parameters are, all the entities' embed- dings, {e j } E j=1 and all the relations' embeddings, {r j } R j=1 . Illustration of the proposed PairRE is shown in Figure 1. Compared to TransE/RotatE, PairRE enables an entity to have distributed representations when involved in different relations. We also find the paired relation vectors enable an adaptive adjustment of the margin in loss function, which alleviates the modeling problem for complex relations.
Let's take a 1-to-N relation as an example. We set the embedding dimension to one and remove the constraint on entity embeddings for better illustration. Given triples (h, r, ?), where the correct tail entities belong to set S = {t 1 , t 2 , ..., t N }, PairRE predicts tail entities by letting where γ is a fixed margin for distance based embedding models and t i ∈ S. The value of t i should stay in the following range: The above analysis shows PairRE can adjust the value of r T to fit the entities in S. The larger the size of S, the smaller the absolute value r T . While models like TransE or RotatE have a fixed margin for all complex relation types. When the size of S is large enough, these models will be difficult to fit the data. For N-to-1 relations, PairRE can also adjust the value of r H adaptively to fit the data.
Meanwhile, not adding a relation specific translational vector enables the model to encode several key relation patterns. We show these capabilities below.
Proof. Assume a subrelation pair r 1 and r 2 that ∀h, t ∈ E: (h, r 1 , t)→(h, r 2 , t). We impose the following constraints: where α ∈ R d . Then we can get (7) When the constraints are satisfied, PairRE forces triple (h, r 2 , t) to be more plausible than triple (h, r 1 , t).
Optimization. To optimize the model, we utilize the self-adversarial negative sampling loss (Sun et al., 2019) as objective for training: where γ is a fixed margin and σ is the sigmoid function. (h i , r, t i ) is the i th negative triple and p(h i , r, t i ) represents the weight of this negative sample. p(h i , r, t i ) is defined as follows: . (9) 5 Experimental results

Experimental setup
We evaluate the proposed method on link prediction tasks. At first, we validate the ability to deal with complex relations and symmetry/antisymmetry, inverse and composition relations on four benchmarks. Then we validate our model on two subrelation specific benchmarks. Statistics of these benchmarks are shown in Table 2. ogbl-wikikg2 2 (Hu et al., 2020) is extracted from Wikidata knowledge base (Vrandečić and Krötzsch, 2014). One of the main challenges for this dataset is complex relations. ogbl-biokg  FB15k  13k  15k  483k  50k  59k  FB15k-237  237  15k  272k  18k  20k  DB100k  470  100k  598k  50k  50k  Sports  4  1039  1312  -307   Table 2: Number of entities, relations, and observed triples in each split for the six benchmarks. (Hu et al., 2020) contains data from a large number of biomedical data repositories. One of the main challenges for this dataset is symmetry relations. Evaluation protocol. Following the state-ofthe-art methods, we measure the quality of the ranking of each test triple among all possible head entity and tail entity substitutions: (h , r , t) and (h, r, t ), ∀h , ∀t ∈ E. Three evaluation metrics, including Mean Rank(MR), Mean Reciprocal Rank (MRR) and Hit ratio with cut-off values n = 1, 3, 10, are utilized. MR measures the average rank of all correct entities. MRR is the average inverse rank for correct entities with higher value representing better performance. Hit@n measures the percentage of correct entities in the top n predictions. The rankings of triples are computed after removing all the other observed triples that appear in either training, validation or test set. For experiments on ogbl-wikikg2 and ogbl-biokg, we follow the evaluation protocol of these two benchmarks (Hu et al., 2020).
Implementation. We utilize the official implementations of benchmarks ogbl-wikikg2 and ogblbiokg (Hu et al., 2020) for the corresponding experiments 3 . Only the hypeparameter γ and embedding dimension are tuned. The other settings are kept the same with baselines. For the rest experiments, we implement our models based on the implementation of RotatE (Sun et al., 2019). All hypeparam-    (Nickel et al., 2016); Results of [3] are taken from (Kadlec et al., 2017). Other results are taken from the corresponding papers. GC-OTE adds graph context to OTE (Tang et al., 2020).  eters except γ and embedding dimension are kept the same with RotatE.

Main results
Comparisons for ogbl-wikikg2 and ogbl-biokg are shown in Table 3. On these two large scale datasets, PairRE achieves state-of-the-art performances. For ogbl-wikikg2 dataset, PairRE performs best on both limited embedding dimension and increased embedding dimension. With the same number of parameters to ComplEx (dimension 100), PairRE  improves Test MRR close to 10%. With increased dimension, all models are able to achieve higher MRR on validation and test sets. Due to the limitation of hardware, we only increase embedding dimension to 200 for PairRE. PairRE also outperforms all baselines and improves Test MRR 8.7%. Based on performances of baselines, the performance of PairRE may be improved further if embedding dimension is increased to 500. Under the same experiment setting and the same number of parameters, PairRE also outperforms all baselines on ogbl-biokg dataset. It improves Test MRR by 0.69%, which proves the superior ability to encode symmetry relations.
Comparisons for FB15k and FB15k-237 datasets are shown in Table 4. Since our model shares the same hyper-parameter settings and implementation with RotatE, comparing with this state-of-the-art model is fair to show the advantage and disadvantage of the proposed model. Besides, the comparisons also include several leading methods, such as TransE (Bordes et al., 2013), DistMult (Yang et al., 2014), HolE (Nickel et al., 2016), ConvE (Dettmers et al., 2018), ComplEx (Trouillon et al., 2016), SimplE (Kazemi andPoole, 2018), SeeK (Xu et al., 2020) and OTE (Tang et al., 2020). Compared with RotatE, PairRE shows clear improvements on FB15k and FB15k-237 for all evaluation metrics. For MRR metric, the improvements are 1.4% and 1.3% respectively. Compared with the other leading methods, PairRE also shows highly competitive performances. All these comparisons prove the effectiveness of PairRE to encode inverse and composition relations.

Further experiments on subrelation
We further compare our method with two of the leading methods ComplEx-NNE-AER and SimplE + , which focus on encoding subrelation. These two methods add subrelation rules to semantic matching models. We utilize these rules as constraints on relation representations for PairRE. Two ways are validated. We first test the performance of weight tying for subrelation rules on Sports dataset. The rules (r 1 −→r 2 ) are added as follows: where θ ∈ R d . The added rules are shown in Table 5. The experiments results in Table 6 show effectiveness of the proposed method.
Weight tying on relation representation is a way to incorporate hard rules. The soft rules can also be incorporated into PairRE by approximate entailment constraints on relation representations. In this section, we add the same rules from ComplEx-NNE-AER, which includes subrelation and inverse rules. We denote by r 1 λ −→ r 2 the approximate entailment between relations r 1 and r 2 , with confidence level λ. The objective for training is then changed to: where L is calculated from Equation 8, µ is loss weight for added constraints, τ subrelation and τ inverse are the sets of subrelation rules and inverse rules respectively. Following (Ding et al., 2018), we take the corresponding two relations from subrelation rules as equivalence. Because τ subrelation contains both rule r 1 →r 2 and rule r 2 →r 1 .
We validate our method on DB100k dataset. The results are shown in Table 7. We can see PairRE outperforms the recent state-of-the-art SeeK and ComplEx based models with large margins on all evaluation metrics. With added constraints, the performance of PairRE is improved further. The improvements for the added rules are 0.7%, 1.2% for MRR and Hit@1 metrics respectively.

Analysis on complex relations
We analyze the performances of PairRE for complex relations. The results of PairRE on different relation categories on FB15k and ogbl-wikikg2 are summarized into Table 8. We can see PairRE performs quite well on N-to-N and N-to-1 relations. It has a significant lead over baselines. We also notice that performance of 1-to-N relations on ogbl-wikikg2 dataset is not as strong as the other relation categories. One of the reasons is that only 2.2% of test triples belong to the 1-to-N relation category.
In order to further test the performance of paired relation vectors, we change the relation vector in RotatE to paired vectors. In the modified Ro-tatE model, both head and tail entities are rotated with different angles based on the paired Figure 2: Performance comparison between RotatE and RotatE+PairRelation on ogbl-wikikg2 dataset.
-FB15k(Hits@10) ogbl-wikikg2(Hits@10) Model 1-to-1 1-to-N N-to-1 N-to-N 1-to-1 1-to-N N-to-1 N-to-N KGE2E KL     (Sun et al., 2019). The embedding dimensions for models on ogbl-wikikg2 are same to the experiments in Table 3, which is 100 for real space models and 50 for complex value based models. Figure 3: Histograms of relation embeddings for different relation patterns. r 1 is relation spouse. r 2 is relation /broadcast/tv station/owner. r 3 is relation /broadcast/tv station owner/tv stations. r 4 is relation /location/administrative division/capital/location/administrative divisioncapital relationship/capital. r 5 is relation /location/hud county place/place. r 6 is relation base/areas/schema/administrative area/capital. relation vectors. This model can also be seen as complex value based PairRE. We name this model as RotatE+PairRelation. The experiment results are shown in Figure 2. With the same embedding dimension (50 in the experiments), Ro-tatE+PairRelation improves performance of RotatE with 20.8%, 27.5%, 14.4% and 39.1% on 1-to-1, 1-to-N, N-to-1 and N-to-N relation categories respectively. These significant improvements prove the superior ability of paired relation vectors to handle complex relations.

Analysis on relation patterns
To further verify the learned relation patterns, we visualize some examples. Histograms of the learned relation embeddings are shown in Figure 3 .
Symmetry/AntiSymmetry. Figure 3a shows a symmetry relation spouse from DB100k. The embedding dimension is 500. For PairRE, symmetry relation pattern can be encoded when embedding r satisfies r H 2 = r T 2 . Figure 3b shows most of the paired elements in r H and r T have the same absolute value. Figure 3c shows a antisymmetry relation tv station owner, where most of the paired elements do not have the same absolute value as shown in Figure 3d.
Inverse. Figure 3c and Figure 3e show an example of inverse relations from FB15k. As the histogram in Figure 3f shows these two inverse relations tv station owner (r 2 ) and tv station owner tv stations (r 3 ) close to satisfy r H 3 • r H 2 = r T 3 • r T 2 . Composition. Figures 3g, 3h, 3i show an example of composition relation pattern from FB15k, where the third relation r 6 can be seen as the composition of the first relation r 4 and the second relation r 5 . As Figure 3j shows these three relations close to satisfy r H 4 • r H 5 • r T 6 − r T 4 • r T 5 • r H 6 .

Conclusion
To better handle complex relations and tackle more relation patterns, we proposed PairRE, which represents each relation with paired vectors. With a slight increase in complexity, PairRE can solve the aforementioned two problems efficiently. Beyond the symmetry/antisymmetry, inverse and composition relations, PairRE can further encode subrelation with simple constraint on relation representations. On large scale benchmark ogbl-wikikg2 an ogbl-biokg, PairRE outperforms all the state-of-theart baselines. Experiments on other well designed benchmarks also demonstrate the effectiveness of the focused key abilities.