Hyperbolic Hierarchy-Aware Knowledge Graph Embedding for Link Prediction

Knowledge graph embedding (KGE) using low-dimensional representations to predict missing information is widely applied in knowledge completion. Existing embedding methods are mostly built on Euclidean space, which are difficult to handle hierarchical structures. Hyperbolic embedding methods have shown the promise of high fidelity and concise representation for hierarchical data. However, the logical patterns in knowledge graphs are not considered well in these methods. To address this problem, we propose a novel KGE model with extended Poincaré Ball and polar coordinate system to capture hierarchical structures. We use the tangent space and exponential transformation to initialize and map the corresponding vectors to the Poincaré Ball in hyperbolic space. To solve the boundary conditions, the boundary is stretched and zoomed by expanding the modulus length in the Poincaré Ball. We optimize our model using polar coordinate and changing operators in the extended Poincaré Ball. Experiments achieve new state-of-the-art results on part of link prediction tasks, which demonstrates the effectiveness of our method.

for models which are built on Euclidean space to preserve hierarchical structures (Nickel and Kiela, 2017).
Recent works proposed hyperbolic representation learning represented by Poincaré Ball (Cannon et al., 1997). Figure 1 shows part of knowledge in the NELL KG (Mitchell et al., 2018), in which the entities show a long-tailed distribution with distance from the The United States. Namely, the hierarchical relationships between entities can be approximated as a tree structure, while the number of entities in each layer increases exponentially with depth of tree increasing. Such a knowledge structure can be well represented with the Poincaré Ball (Ungar, 2001), which is a type of hyperbolic space suitable for embedding the hierarchical structures and entities in KGs. Even most hyperbolic KGE models choose Poincaré Ball model to embed the structures, they still suffer from the problems of restricted capacity and floating-point precision when majority of points are embedded near by the boundary of Poincaré Ball due to long-tail distribution.
To tackle these challenges, this paper proposes a novel hyperbolic knowledge embedding method named HBE (Hyperbolic extended Poincaré Ball Embedding), which employs a extended Poincaré Ball for KG embedding and captures hierarchical structures with polar coordinate system in a hyperbolic transformation (Chami et al., 2019). First, HBE uses tangent space to initialize the entity and relation vectors in conventional Euclidean space. Then, it projects embedded entities into polar coordinate to gain hierarchical information and takes embeddings into radius and angle parts to absorb hyperbolic information. The boundary is stretched and zoomed via expanding the modulus length in the Poincaré Ball. At the same time, the addition rule in the extended Poincaré Ball are changed accordingly, so that the model can be optimized in the extended Poincaré Ball. The overall training process adopts the idea of TransE, that is, through the transformation of positive and negative examples, the measurement distance between the result of the head entity and the relationship and the tail entity is as small as possible.
In summary, the main contributions of this paper are three-fold: (1) We propose a novel hyperbolic knowledge embedding method, HBE, to apply extended Poincaré Ball for KGE and captures hierarchical structures.
(2) In order to enable the model to be optimized in extended Poincaré Ball, we finetune the operator and fix model in polar coordinate system to embed entities and relations. (3) Experiments show that HBE outperforms state-of-the-art methods on link prediction tasks at a moderate dimension.

Hyperbolic Geometry
The hyperbolic space is one of the three kinds of isotropic spaces, which includes Euclidean (flat), spherical (positively curved) and hyperbolic (negatively curved) spaces (Cannon et al., 1997). Compared with the Euclidean and spherical spaces, the amount of space covered by a hyperbolic geometry increases exponentially rather than polynomially (Buser, 1992). This property allows us to capture KG structures with hyperbolic space and suits those forming hierarchies. For the hyperbolic geometry, there are several important isometric models including the hyperbolic model, Klein disk model and Poincaré Ball model. This paper chooses the extended Poincaré ball model due to its feasibility for gradient optimization (Abramowicz et al., 2002) and its infinite boundary. We hereby introduce some basic operations of hyperbolic geometry and Poincaré Ball, and then address the way to ex-tend Poincaré Ball and modify some operators in extended one.
Specifically, a d-dimensional Poincaré Ball with a negative curvature -c (c > 0) is defined by the manifold B d , g x (Ungar, 2001). The geodesic distance (or hyperbolic distance) d(u, v) between vectors u and v in the Poincaré Ball with c = 1 is given by (Ungar, 2001): (1) When the points move from the origin towards the ball boundary, their geodesic distance increases exponentially, offering a larger capacity of space for embedding concepts and entities (Ungar, 2001).
The vector translation in the Poincaré Ball is defined by the Möbius addition (Ungar, 2001): x ⊕ c y = 1+2cx·y +c y 2 x + 1−c x 2 y 1 + 2cx · y + c 2 x 2 y 2 (2) where x, y are hyperbolic vectors and c is the curvature of hyperbolic space.
Previous work defines the matrix-vector multiplication between Poincaré Balls using the exponential and logarithmic maps (Ungar, 2001). The hyperbolic vectors are first projected into the tangent space at 0 using the logarithmic map (log 0 ) then multiplied the transformation matrix like what in the Euclidean space, and finally projected back on the manifold with the exponential map (exp 0 ) (Nickel and Kiela, 2017). Specifically, the two projections on vector v ∈ B are defined as follows: Through such projections, we can apply any Euclidean counterpart operations on hyperbolic vectors. The transformation can be done using the Möbius version of matrix-vector multiplication: Möbius scalar multiplication can be obtained in the same way as: 3 Methodology Poincaré Ball model can suit hierarchical structure of KGs, in which entities can form different hierarchies under different relations. For example, in WordNet (Fellbaum, 1998) the chair is a parent node to different chair types (e.g. f olding_chair, armchair) under the relation hypernym. And both chair and their types are parent nodes to parts of a typical chair (e.g. backrest, leg) under the relation has_part. The parent node chair would be embedded closer to the origin and node backrest would be farther to origin. An ideal embedding model should capture all hierarchies simultaneously. Take bilinear models (Wang et al., 2014) as an example, they can measure similarity between the subject entity embedding and an object entity embedding using the Euclidean inner product. However, a clear correspondence to the Euclidean inner product does not exist in hyperbolic space. The Euclidean inner product can be expressed as a function of Euclidean distance and norms. Noting this, Poincaré GloVe (Tifrea et al., 2019) absorbed the squared norms of the embeddings into the biases by replacing the Euclidean with the Poincaré distance d B (u, v) to obtain the hyperbolic version of GloVe.
Meanwhile, the capacity of Poincaré Ball model is restricted by floating-point precision when majority of points locate near by the boundary due to long-tail distribution. To tackle this problem, we utilize an extended Poincaré Ball to expand the border into infinity and adjust some operators to align with Euclidean geometry, which redefines the distance d B (u, v).

Extended Poincaré Ball
In Poincaré Ball, it is obvious that the whole space is symmetric along the center but the apparent Euclidean distance from the origin to any point is not equal to the hyperbolic distance (Chami et al., 2019). In order to make the apparent distance consistent with the actual hyperbolic distance, we establish a new model to ensure the distance from any point to the center is just equal to which in hyperbolic space, which is called the extended Poincaré Ball. Suppose that the polar coordinates of any point in the original coordinate system (Poincaré disk) is(r, θ), and that in the new space is (2tanh −1 r, θ) (Buser, 1992). As Figure  2 shown, circles in extended Poincaré Disk (in 2dimension) are twisted. In this way, the radius of ball space is infinite.
Therefore, points near the boundary are extremely compressed in Poincaré Ball while there is no such problem in the extended one. Meanwhile, it can be proved that Hyperbolic Cosine Theorem still holds for the operators in extended Poincaré Ball: cosh(c) = cosh(a)cosh(b) − sinh(a)sinh(b) cos γ (a, b, c stand for the geodesic distance of triangle and γ stands for the angle between a, b). Extended Poincaré Ball and Poincaré Ball share the same distance form when calculated by cosine theorem as well (see appendix A.1 for detailed information).
Furthermore, inspired by the Hyperbolic Cosine Theorem, in which the hyperbolic distance can be composed of modulus and angle, we use polar coordinates to embed entity and relation into extended Poincaré Ball. The score function can be formed as two parts -polar radius and polar angle.
where α and β are the weights to be learned. The radius part plays an essential role in levels of entities in extended Poincaré Ball. And the angle aims to distinguish entities in the same level. The whole function shares the similar way as works in entity typing proposed by Federico (Lopez et al., 2019) which does not satisfy Cauchy inequality. So we can formulate the polar radius with Möbius addition and multiplication as follows: where h, r, t stand for hyperbolic embeddings of head entity, relation and tail entity. And R stands for relation matrix in hyperbolic space inspired by MuRP (Balazevic et al., 2019). Due to the property of extended Poincaré Ball, one can classify the embedding levels of different entities by Euclidean Norm.
A point in polar coordinates system in high dimension can be formulated as: (9) In consider of convergence and efficiency, we can simplify and formulate the polar angle as: ∆θ = π −|π −|θ −θ ||. Consequently, we simplify Equation 9 in TransE form (Bordes et al., 2013) as: From another perspective, angle parts can be replaced with radius parts by Cosine Theorem in hyperbolic space. However, to better capture complex relation such as symmetry, anti-symmetry, inversion and composition, it is necessary to utilize extra angle part for downstream tasks like link predictions. On the other hand, the introduction of angle part can simulate the rotation in RotatE (Sun et al., 2019). Theoretically, any algebraic system hold the fundamental properties of congruence can be used as angle part in HBE when embedding complex relations. Take angles as an example, suppose that a relation θ r ∈ [0, 2π) is close to π, then a symmetric relation can be formed as (θ h + θ r + θ r ) mod 2π = θ h mod 2π with arbitrary θ h and = for asymmetric relations.

Optimization
Since the Poincaré Ball has a Riemannian manifold structure, we optimize radius parameters with stochastic Riemannian optimization methods such as RSGD or RSVRG (Bonnabel, 2013). Let ∇E denote the Euclidean gradient of L(P ). Using RSGD, the Riemannian gradient can be computed In summary, the full update for a single embedding is calculated by: where η denotes the learning rate. According to the isometric projection of Poincaré Ball, the angle part can be optimized by Euclidean optimization methods such as SGD or Adam (Kingma and Ba, 2015).
To train the model, we use the negative sampling loss functions with self-adversarial training (Sun et al., 2019).
where λ is margin. For negative samples, where p is the probability distribution of sampling negative triples, and α is the temperature of sampling.

Experiments
To evaluate our approach, we choose the widely used KG datasets WN18RR (Dettmers et al., 2018) and FB15K-237 (Bordes et al., 2013). WN18RR is a subset of WordNet, a hierarchical collection of relations between words, created from WN18 by removing the inverse of many relations from validation and test sets to make the dataset more challenging, containing 40,943 entities and 11 relations. FB15k-237 is a subset of Freebase, a collection of real world facts, created from FB15k in the same way as WN18RR. FB15k-237 contains 14,541 entities and 237 relations. The statistics of datasets are shown in Table 1. Noteworthily, the lower the metric E G is, the more tree-like the KG is (E G is the mean of the estimated curvatures of the sampled triangles. See (Chami et al., 2020) for more details).
We evaluate HBE on the task of KG link prediction, which is critical for practical applications. We use the scoring function to rank the correct tail or head entity against all possible entities for link prediction tasks in KGs. The evaluation metrics are: (1) mean reciprocal rank (MRR), which measures the mean of inverse ranks assigned to correct entities; and (2) hits at K (H@K, K ∈ 1, 3, 10), which measures the proportion of correct triples among the top-K predicted triples.   2019) on WN18RR datasets, which demonstrates the promising potential of hyperbolic space and polar coordinate. Nevertheless, the results in FB15k-237 shows that HBE has similar preformance with RotatE . The reason may lies in special structure of FB15k as many points with low level of hierarchy and great degree in KGs, which confuse the radius part of HBE. Because low-level points with low level trends to be embedded near by border in raw Poincaré ball while great-degree points not. As shown in the Figure 3(a), the hierarchy in RotatE is not distinguished, and the overall distribution is more uniform after dimension reduction,  which may be related to the design of complex number and rotation. Figure 3(b) shows that the points in the Poincaré ball are obviously more sensitive to hierarchy, which is to say there are sparse in the middle and dense near by boundaries. Meanwhile, it is specific that most of the points are concentrated near the boundary, which makes the model prone to the problems of poor convergence and shortage of floating-point precision. HBE in Figure 3(c) utilizes extend boundary and releases the problem of dense distribution. Finally, according to Table 4, HBE-polar is the polar version in Euclidean space. HBE-dis shows the result of radius part only, HBE-raw is the model in Poincaré Ball without extension of boundary. Detail information about weights of polar coordinate, analysis on hierarchical embeddings, and the relation case study is addressed in A.2, A.3, and A.4, respectively.

Conclusion
We introduce a novel translational method for embedding hierarchical KGs in extended Poincaré ball of hyperbolic geometry. Our model learns hierarchy-specific parameters with polar coordinate by Möbius multiplication and Möbius addition. We show that HBE outperforms on the link prediction task on some hierarchical KG dataset. Then: And, Take s into Equation 18: Take this transformation into Extended Poincaré Disk (Ball): cosh(x) = cosh(r) cosh(r )−sinh(r) sinh(r )cos(∆θ) (21) Theorem is proved.

A.2 Weights of Polar Coordinate
To analyze influence of curvature and the weight of radius and angle part, we collect ratio and curvature with different dimensions. Figure 4(a) shows the weight ratio of radius and angle in different dimension. And x-axis stands for dimension and y-axis is the ratio of β α+β , which stands for weight of radius part and dimension. With dimension grows by, the share of radius part descends rapidly and the curvature of extended Poincare Ball tends to be 0 (more flat and close to Euclidean Space) in Figure 4(b). And the performance of KEEN in high dimensions will close to RotatE.

A.3 Analysis on Hierarchical Embeddings
For certain relations, we sample some triplets with hierarchical relation _hypernym from WN18RR and show their head and tail entity embeddings' radius parts of distribution.
As we expected in Figure 5(a), the tail level is higher than head level. And relation /f ilm/f ilm/genre in FB15k-237 has a similar situation which can be leveled and distinguished by radius part in Figure 5(b). To compare with, we   choose relation _derivationally_related_f orm which is symmetric as an example in Figure  5(c). And radius part distribution of head and tail entities with _derivationally_related_f orm  is much more irregular than hierarchical ones in _hypernym. In other words, angle parts are in a leading position when representing this kinds of symmetric relations. Furthermore, this happens on relation /celebrities/f riendship/f riend in FB15k-237. Furthermore, we can calculate the KL divergence of several head-tail distribution mentioned above for further analysis. On the other hand, it is apparent that _hypernym has more hierarchical structure than /f ilm/f ilm/genre when comparing 0.084 with 0.078 in radius part, which can be inferred in Table 5 by E G in Appendix A.4.

A.4 Relation Case Study
In order to further analyze the performance on different relations, we conducted the relation case study. Table 5 illustrates the link prediction performances of 6 relations, in which a relation is supposed to be more hierarchical with lower E G .
There are obvious different performances between semantic hierarchical relations (such as hypernym or haspart) and semantic nonhierarchical relations (such as verb_group or also_see). As is mentioned above, E G is used for measuring tree-like level of a KG. And relation also s ee with smaller E G and semantic symmetric property could be well embedded by HBE. It is worth noting that results of verb_group are surprisingly good and it may not be so reliable due to its small amount in test set, which needs further analysis. From another standpoint, the hierarchical level of a KG may not be well defined by E G .