BiQUE: Biquaternionic Embeddings of Knowledge Graphs

Knowledge graph embeddings (KGEs) compactly encode multi-relational knowledge graphs (KGs). Existing KGE models rely on geometric operations to model relational patterns. Euclidean (circular) rotation is useful for modeling patterns such as symmetry, but cannot represent hierarchical semantics. In contrast, hyperbolic models are effective at modeling hierarchical relations, but do not perform as well on patterns on which circular rotation excels. It is crucial for KGE models to unify multiple geometric transformations so as to fully cover the multifarious relations in KGs. To do so, we propose BiQUE, a novel model that employs biquaternions to integrate multiple geometric transformations, viz., scaling, translation, Euclidean rotation, and hyperbolic rotation. BiQUE makes the best trade-offs among geometric operators during training, picking the best one (or their best combination) for each relation. Experiments on five datasets show BiQUE’s effectiveness.


Introduction
Knowledge graphs (KGs) provide an efficient way to represent real-world entities and their intricate connections in the form of (head, relation, tail) triples. Each head/tail entity corresponds to a node in a KG, and each relation represents a directed edge between them. Imbued with rich factual knowledge, KGs have demonstrated their effectiveness in a wide range of downstream applications (Wang et al., 2018;Saxena et al., 2020). Although problems due to incompleteness and noise continue to plague KGs, those issues have been ameliorated by knowledge graph embeddings (KGEs) that project entities and relations into lowdimensional dense vectors.
Current KGE methods mainly focus on exploiting geometric transformations and embedding spaces to model relational patterns such as (anti)symmetry, inversion, and composition.  Figure 1: interacts_with is symmetric (green arrows) and hierarchical (blue arrows) in different context. Each affects (red arrow) can be composed from a greenarrow relation followed by a blue-arrow one.
TransE (Bordes et al., 2013) represents each relation as a translation from a head entity to a tail entity. With translations alone, TransE cannot model many relation types such as symmetric ones. In contrast, RotatE (Sun et al., 2019) represents each relation as a rotation in complex space. It proves that it can model (anti)symmetry, inversion, and composition patterns. QuatE (Zhang et al., 2019) extends RotatE's complex number representation to a hypercomplex number representation.
A drawback of rotation-based KGE models is that their representations are entrenched in (Euclidean) circular rotation, and hence they are unable to model hierarchical and tree-like structures (e.g. hypernym and part_of ). Such hierarchical relations are common and even pervasive in some KGs. Since, in circular rotation, all rotating points are constrained to be at the same distance from the center of a circle, it is hard to model relations whose semantics require that entities move at different distances from the nexus.
To overcome this shortcoming, recent models project KGs into hyperbolic space (Balazevic et al., 2019;Chami et al., 2020). The hyperbolic models inevitably lose the basic properties of Euclideanspace transformations, and thus cannot avail themselves of these useful operations. Moreover, it is difficult to seamlessly integrate the hyperbolic models with extant non-hyperbolic models to create more powerful hybrids because the models' different geometric representations do not cohere.
What we want is the best of best worlds, i.e., a model capable of effecting both circular rotations and hyperbolic transformations, in a coherent geometric representation. This allows the model to choose the best representation for each relation, e.g., circular rotations for symmetric/inversion relations and hyperbolic rotations for hierarchical patterns. In addition, for relations that exhibit both circular and hyperbolic characteristics (e.g., the in-teract_with relation in Figure 1), the model would rely on the data to choose the sweet spot balancing both transformations. For relations that are best captured by the composition of circular and hyperbolic rotations (e.g., affects in Figure 1), the model would learn the best composition of both representations jointly. Lastly, by subsuming circular rotations, the model would inherit the representational prowess of circular-rotation models.
In this paper, we propose precisely such a model named BiQUE. BiQUE employs a powerful algebraic system called biquaternions (Ward, 1997) to represent KGs. Most common number systems used by current KGE methods (including real numbers, complex numbers, and real quaternions) are subsumed and systematically unified by biquaternions. Further, the Hamilton product of biquaternions, at the core of BiQUE, imbues it with a strong geometric interpretation that combines both circular rotations and hyperbolic rotations. In sum, our contributions are as follows.
• To our knowledge, we are the first to use biquaternionic algebra for KGEs. Algorithmically, we contribute by designing a flexible score function that leverages multiple geometric transformations (scaling, translation, circular rotation, and hyperbolic rotation).
• Theoretically, we contribute by rigorously proving that BiQUE's biquaternionic transformation is equivalent to the composition of a circular rotation and a hyperbolic rotation.
• Empirically, we contribute by validating and analyzing BiQUE's effectiveness on five KG benchmarks that span a wide gamut of sizes.

Related Work
We briefly survey KGE methods that are most relevant to our approach.
Euclidean models. These models represent entities and relations by real vectors, and can be categorized into translation-based models (Bordes et al., 2013;Wang et al., 2014;Lin et al., 2015;Ji et al., 2016), semantic-matching models (Nickel et al., 2011;Yang et al., 2015), and neural models (Dettmers et al., 2018;Vashishth et al., 2020a In contrast to existing systems, our model BiQUE overcomes their weaknesses by integrating the strengths of their respective geometric rep-resentations into one coherent representation using biquaternions. By subsuming the complexvalued rotation-based models (e.g., complEx and QuatE), it retains their strengths in capturing (anti)symmetric, inversion, and composition patterns. By incorporating a hyperbolic representation, it is also able to model hierarchical semantics.

Background
Biquaternions, endowed with rich algebraic properties, have been widely used in quantum mechanics, general relativity, and signal processing (Pei et al., 2004;Gong et al., 2011), but have yet to make inroads into knowledge graph embeddings. A biquaternion is defined on a four-dimensional vector space over the field C of complex numbers. We denote a complex number c ∈ C as c = c r + c i I where c r , c i ∈ R are real numbers, and I is the usual imaginary unit (I 2 = −1).
Definition 1. The basic algebraic forms of a biquaternion q are where w, x, y, z ∈ C are q's coefficients, w r , x r , y r , z r , w i , x i , y i , z i ∈ R, q r = w r +x r i+y r j+ z r k, q i = w i +x i i+y i j+z i k, and i, j, k are imaginary units that have the following (non)commutative multiplication properties: We denote the scalar and vector parts of q respectively as s(q) = w and v(q) = xi+yj+zk. A pure biquaternion q is one with s(q) = 0. A quaternion (Hamilton, 1844) is a restricted biquaternion, in which w, x, y, z ∈ R (e.g., q r and q i in Equation 2 are quaternions). Complex numbers and real numbers are both special cases of biquaternions.
Equation 5 is termed the Hamilton product between q 1 amd q 2 . Alternatively, the multiplication can be equivalently represented as a matrix-vector product or as a matrix-matrix product M(q 1 q 2 ) = M(q 2 ) M(q 1 ).
Equations 6 and 7 can be easily verified by substituting in Equations 4 and 5, and using normal matrix multiplication. The set of biquaternions and the set of quaternions are both closed under multiplication, and multiplication is associative but not commutative. Also note that q 1 q 2 = q 2 q 1 . By using Equation 4, we can easily verify that M(q) = M(q) T and M(q * ) = M(q) * (where M(·) * refers to the complex conjugation of each element in the matrix).

Unification of Circular and Hyperbolic Rotations
We prove that a biquaternion unifies both circular and hyperbolic rotations in C 4 space within a single representation in Theorem 4.1 (the proof is in Appendix A).
Theorem 4.1. Let M(q) be the matrix representation of a unit biquaternion q = q r + q i I, where q r = w r +x r i+y r j+z r k, and q i = w i +x i i+y i j+z i k. M(q) can be factorized as  An orthogonal matrix with determinant 1 represents a rotation in the space in which it operates (Artin, 1957). Since we know both M(h) and M(u) are orthogonal and have determinants 1 from Theorem 4.1, they each represent a rotation in C 4 space. From the form of the matrices, we can see that M(u) represents a circular rotation, while M(h) represents a hyperbolic rotation 1 . To see the hyperbolic-rotation nature of M(h) more clearly, we can use the identities cosh φ = cos Iφ and I sinh φ = sin Iφ to represent M(h) as Now M(h) takes the form of a "regular" rotation matrix (cf. M(u)), but with a complex angle Iφ. According to Lansey (2009), a rotation through an imaginary angle Iφ can be understood as a hyperbolic rotation through the real angle φ. Consequently, a unit biquaternion composes these two kinds of rotations in a coherent algebraic representation. (It has been shown by Jafari (2016, Corollary 4.1) that M(q) is orthogonal with a determinant of 1, and thus represents an arbitrary rotation in C 4 . However, that paper does not tease apart the matrix to reveal the contributions of its component circular and hyperbolic rotation matrices like we have done.) Our results extend to arbitrary (not necessarily unit) biquaternions. Any biquaternion q is a scaled version of its unit biquaternion, i.e., q = ||q||( q ||q|| ). Thus its matrix M(q) represents a circular rotation followed by a hyperbolic rotation (i.e., M(h)M(u)), or a hyperbolic rotation followed by a circular rotation (i.e., M(u)M(h )). Both rotations are represented by q ||q|| , followed by a scaling by ||q||.
We analyze and visualize the M (u) and M (h) rotations in Appendix B.
It is worth noting that the system QuatE 2 (Zhang et al., 2019), an experimental baseline in Section 5, uses quaternions as its representation. Because quaternions are special cases of biquaternions, QuatE 2 only employs the circular rotation matrix M (u) (with its M (h) as the identity matrix). Further, note that the power of a biquaternion does not merely comes from doubling the parameters of a quaternion. A biquaternion achieves better representational power and parameter efficiency by facilitating the interactions between its real and imaginary parameters (see the last paragraph of Section B of the Appendix, and subsection 5.5.3).

Problem Definition
A multi-relational knowledge graph KG is represented as a set of directed triples, i.e., KG = {(h, r, t)}. Each triple (h, r, t) consists of a head entity h ∈ E, a relation r ∈ R, and a tail entity t ∈ E. The numbers of entities and relations are denoted as |E| = N e and |R| = N r respectively. The goal of a knowledge graph embedding model is to project entities and relations into a continuous vector space while preserving their original semantics. The knowledge graph completion (KGC) task requires a model to predict the probability of ex-istence or correctness of unseen triples using the observed triples KG.

The Proposed Model
In our BiQUE model, we represent the entities and relations in a KG as vectors of biquaternions. Let Q be the set of biquaternions. Each entity e is a vector Q e of k biquaternions, i.e., Q e = [q 1 , q 2 , . . . , q k ] T , where q 1 , q 2 , . . . , q k ∈ Q. We denote a head entity and a tail entity as Q h and Q t respectively. Each relation r is modeled as two vectors Q + r and Q × r , each of which also contains k biquaternions. An entity or relation vector Q ∈ {Q h , Q t , Q + r , Q × r } can also be expressed as Q = w+xi+yj+zk where w, x, y, z ∈ C k (i.e., w, x, y, z are each a vector containing k complex numbers, with its i th element corresponding to the i th biquaternion in Q).
(Note the similarity between the form of Q and that of a biquaternion in Equation 1.) Because a complex number can be represented by two real numbers (its real and imaginary components), Q can be represented with k × 4 × 2 = 8k real numbers (8k is its embedding size). (For expository convenience, we refer to Q as a "biquaternion" or an "embedding"; their structures should be clear from their contexts.) Currently, the loss functions of KGE models can be roughly categorized as additive and multiplicative ones depending on the relation transformation projecting a head entity to a tail entity. Allen et al. (2021) has recently shown that projections require matrix multiplication, and cannot be achieved via addition alone. Thus, it is necessary to combine both additive and multiplicative operations into a loss function to represent powerful projections.
We represent the transformation due to relation r with the biquaternions Q + r and Q × r . The embedding Q + r applies a relation-specific translation to a head entity's embedding Q h . We realize it by the element-wise addition of biquaternions (similar to what we do with real vectors for translation): Next the embedding Q × r applies a relation-specific multiplicative transformation to the translated head entity Q h,r . The multiplicative transformation is defined via the Hamilton product of biquaternions (Equation 5) as follows.
where denotes the element-wise application of the Hamilton product between Q h,r and Q × r , and ⊗ denotes the element-wise multiplication between vectors of complex numbers. As shown in subsection 4.1, each biquaternion in Q × r represents a composition of circular rotation, hyperbolic rotation, and scaling. In the above Hamilton product, we bring this powerful composition to bear on the projection of the translated head entity. The Hamilton product of biquaternions in Equation 9 has an added benefit of increasing the potential interaction between entities and relations through the multiplications between different components of the entities and relations (observe that each element in ) Overall, our model unifies multiple expressive geometric transformations (translation, scaling, circular rotation, and hyperbolic rotation) into one coherent representation system.

Score Function and Training Loss
We measure the plausibility score of a given triple (h, r, t) by computing the vector similarity between the transformed head entity Q h,r = w+ xi+ yj+ zk (from Equation 9), and a candidate tail entity Q t = w t +x t i+y t j+z t k as where ·, · denotes the standard dot-product between vectors. We regard the task of knowledge graph completion as a multi-class classification problem and employ the cross-entropy loss to train our model.
To combat overfitting, we follow previous work (Zhang et al., 2019), and append a N3 regularization norm (Lacroix et al., 2018) to our loss function, thus obtaining where λ, λ 1 , λ 2 are the global, entity and relation regularization hyperparameters respectively. · 3 denotes L 3 norm of vectors.

Experiments
To validate BiQUE's effectiveness, we conduct extensive experiments on the knowledge graph completion (KGC) task. We use three standard knowledge graph datasets, viz., WN18RR, FB15K-237, and YAGO3-10. In addition, to demonstrate BiQUE's scalability, we run it on two huge commonsense knowledge graph datasets, viz., Con-cept100k and ATOMIC. Our codes and datasets are publicly available at https://github.com/ guojiapub/BiQUE.

Datasets
The WN18RR (Dettmers et al., 2018) and FB15K-237 (Toutanova and Chen, 2015) datasets are respectively subsets of WN18 and FB15K (both from Bordes et al. (2013)). (Both WN18 and FB15K have test leakage problems, which allow their test triples to be easily inferred. Thus, KGE models typically perform well on those two datasets, and they do not help to differentiate between models. Because of this, we do not use them in our experiments.) To make the KGC task more challenging, FB15K-237 and WN18RR remove the inverse relations from the original validation and test sets of WN18 and FB15K. The CN-100K (Li et al., 2016) and ATOMIC (Sap et al., 2019) datasets are two large knowledge graph benchmarks recently adopted for evaluating commonsense reasoning. ATOMIC mainly describes the reactions, effects, and intents of human behaviors, and represents each entity as a phrase with an average length of 4.4 words. CN-100K contains general commonsense knowledge about the world. For CN-100K and ATOMIC, we use the data splits of previous work (Malaviya et al., 2020). Table 1 provides details on the datasets. (Note that the datasets span a wide range of sizes.) Both WN18RR and YAGO3-10 contain many relations with hierarchical semantics, e.g., hypernym and part_of (Chami et al., 2020). On the other hand, most of FB15K-237's edges are antisymmetric, and it does not have much hierarchical structure (Balazevic et al., 2019). The varying level of hierarchical structure in the datasets helps to highlight BiQUE's adaptability to datasets with different relation types. ATOMIC mainly contains causeeffect relations that are not hierarchical. CN100k contains several hierarchical relations (e.g., IsA and AtLocation). Aside from their large sizes, these two datasets have the challenging feature of being extremely sparse.

Evaluation Protocol
We use standard evaluation metrics for the knowledge graph completion (KGC) task, viz., mean reciprocal rank (MRR) and Hits@k with cut-off values k ∈ {1, 3, 10}. For both MRR and Hits@k, the larger the metric, the better the performance of a model. We adopt the BOTTOM setting (Sun et al., 2020) when ranking candidate triples and we consistently apply it to our BiQUE model, i.e., the correct triple is always inserted at the end of a list of triples with the same plausibility scores. This is the strictest evaluation protocol for KGC tasks, and provides the best reflection of a model's performance. Finally, we report filtered results like previous work (Bordes et al., 2013) for fair comparisons. (Implementation details are in the appendix.)     Table 2 shows the experimental results. Following standard practice adopted by our comparison systems, we show the best results for BiQUE. (We pick the best result over 10 runs with different random initializations. The appendix contains the average scores over the runs, and the standard deviations.)

Results
From Table 2, we see that BiQUE is the best performer on four of the five datasets, and is a close second on the remaining dataset. On the two hierarchical datasets WN18RR and YAGO3-10, BiQUE achieves new state-of-the-art results on all metrics, and surpasses the second best models by a clear margin. WN18RR contains a large proportion of symmetric relations (which are amenable to being modeled with circular rotations) and hierarchical relations (which are amenable to being modeled with hyperbolic rotations). BiQUE's good performance on this dataset provides evidence that BiQUE's composition of circular and hyperbolic rotations is useful in modeling these disparate relation types simultaneously. BiQUE also consistently outperforms the hyperbolic models MurP and AttH on all metrics.
On FB15K-237, BiQUE is second best; however, BiQUE's scores are only marginally lower than those of the best system QuatE 2 . As observed by Balazevic et al. (2019), the vast majority of relations in FB15K-237 do not form hierarchies. Consequently, BiQUE's hyperbolic transformation does not play a principal role on FB15K-237, and BiQUE falls back on only using its circular rotation transformation. QuatE 2 can be viewed as a special case of BiQUE that only has circular rotations (M(u) in Thoerem 4.1). Since both use   Table 3 shows the results on the large commonsense graphs, CN-100K and ATOMIC. We see that these datasets are a lot more challenging with many KGE models having MRRs below or hovering around 0.3. (Because of the datasets' large sizes, many extant KGE models do not experiment on them, and we compare against the models that have been reported in the literature. Malaviya et al. (2020) describes systems that use BERT embeddings, and thus encapsulate a lot of commonsense prior knowledge; for a fair comparison, we do not use such systems as baselines.) Table 3 shows that BiQUE outperforms the previously reported state-of-the-art results on CN-100K (RotatE) and ATOMIC (ComplEx) (by 29.6% and 34.5% on MRR respectively). Further, by composing hyperbolic rotation with circular rotation, BiQUE surpasses QuatE 2 that only uses the latter rotation.

Performance per Relation
To provide a fine-grained analysis of BiQUE's results, we report its performance per relation on WN18RR in Table 4. A large portion of the WN18RR dataset consists of hierarchical triples, such as hypernym and instance_hypernym, which account for more than 43% of training examples. We see that BiQUE achieves the best performance on all 11 relation types compared with current topperforming models. BiQUE not only performs well on hierarchical and tree-like relations (e.g., hypernym and instance_hypernym), but also obtains significant improvements on challenging one-to-  many relations (e.g., member_of_domain_region and member_meronym). It is worth noting that BiQUE does not sacrifice its performance on other relation types for the abovementioned improvements. In fact, BiQUE also achieves the best performance for symmetric relations (e.g., deriva-tionally_related_form and verb_group). All in all, these fine-grained results support our hypothesis that BiQUE's integration of multiple geometric transformations allows it to make good trade-offs among various representations for different relation types, thereby allowing it to pick the best one (or combinations thereof) for optimal performance. Table 5 shows the performances of variants and ablations of BiQUE. We test the impact of BiQUE's scaling operation by normalizing the relation rotation biquaternion Q × r in two ways: Q × r uses the regular normalization of real vectors, and Q × r uses the standard normalization of biquaternions. (See appendix for details.) Compared to BiQUE, both normalized variants perform worse. Thus the norm of Q × r plays an important role in having a scaling effect. Further, from our ablation study, we observe that without the translation operator Q + r , BiQUE performs worse on both datasets. This shows that rotations cannot fully replace translations in KGE models. Table 5 also shows that regularization is important for BiQUE to avoid overfitting.

Model Efficiency
We investigate the impact of varying embedding size on performance (H@1). In Figure 2, the parameters of all systems are tuned, and the results are averaged over 5 runs (with different random initializations). We see that BiQUE's results consistently surpasses those of the strong baselines across embedding dimensions. The disparity is most apparent in the regime of small embedding sizes. This suggests that BiQUE's representation  is more effective at modeling the data (and thus do better even with smaller embeddings).
In Table 6, we use the same number of parameters for BiQUE and QuatE 2 (the baseline most similar to BiQUE), and show that BiQUE performs better (higher MRR and H@3) with fewer epochs. This again supports our hypothesis that BiQUE models the data better (with the same number of parameters) than QuatE 2 , and thus requires fewer epochs to achieve better results.

Conclusion
In this paper, we propose BiQUE, a novel model that uses biquaternionic algebra for KGEs, and combines multiple geometric transformations in a coherent representation. Our experimental results and detailed empirical analysis demonstrate the effectiveness, scalability, and advantages of our model. As future work, we will extend BiQUE to work on knowledge hypergraphs.

A Proofs
We prove that a biquaternion unifies both circular and hyperbolic rotations in C 4 space within a single representation in Theorem 4.1. To do so, we require Theorem A.1 that is proved by Jafari (2016), and the definitions covered in Section 3. We also prove auxiliary Lemma A.2. Theorem A.1 (Jafari, 2016, Theorem 4.1(vi)). If M(q) is the matrix representation of a biquaternion q, then the matrix's determinant is given by det[M(q)] = ||q|| 4 .
Lemma A.2. If q = q r +q i I is a unit biquaternion (i.e., ||q|| = 1) where q r = w r +x r i+y r j+z r k, and q i = w i + x i i + y i j + z i k, then q r q i and q i q r are pure quaternions (i.e., their scalar parts s(q r q i ) = s(q i q r ) = 0), ||q r || = cosh φ and ||q i || = sinh φ , where φ ∈ R.
Proof. Note that both q r and q i are quaternions, and q = (w r +w i I)+(x r +x i I)i+(y r +y i I)j+(z r +z i I)k.
||q|| 2 =(w r +w i I) 2 +(x r +x i I) 2 +(y r +y i I) 2 Since ||q|| 2 = 1 is real, we know that the above imaginary part w r w i + x r x i + y r y i + z r z i = 0. s(q r q i ) =s((w r −x r i−y r j−z r k)· (w i +x i i+y i j+z i k)) = w r w i + x r x i + y r y i + z r z i (using Eq. 5) = 0. s(q i q r ) =s((w i +x i i+y i j+z i k)· (w r −x r i−y r j−z r k)) = w i w r + x i x r + y i y r + z i z r (using Eq. 5) = 0.
Thus q r q i and q i q r are both pure quaternions (recall that the set of quaternions is closed under multiplication). ||q|| 2 =qq = (q r +q i I)(q r +q i I) = (q r +q i I)(q r +q i I) = q r q r + q i q i I 2 + (q r q i + q i q r )I = ||q r || 2 − ||q i || 2 + (q r q i + q i q r )I.

B Analysis and Visualization of BiQUE's Circular and Hyperbolic Rotations
To analyze BiQUE's circular and hyperbolic rotations, we restrict ourselves to two dimensions. This means a biquaternion takes the form of q = w + xi where w, x ∈ C. The unit quaternion q r in Theorem 4.1 is thus q r = w r + x r i where w r , x r ∈ R, and x r /||v(q r )|| = x r /||x r i|| = 1. The circular cos θ − sin θ sin θ cos θ We can see that the real parts w r , x r and imaginary parts w i , x i are transformed independently. Hence we can accomplish the same effect by rotating two quaternions (w r + x r i and w i + x i i) independently, and this does not imbue biquaternions with added representational power beyond that of quaternions, which also have the rotation M (u) matrix. Now, we examine the effect of the hyperbolic rotation matrix M (h). Since ai + bj + ck is a unit quaternion (as shown in the proof of Theorem 4.1), and we restrict ourselves to two dimensions, it must be that a = 1, b = 0, c = 0. The hyperbolic rotation matrix is thus We multiply M (h) with an arbitrary biquaternion (w r + w i I) + (x r + x i I)i to transform the latter.
Observe each term in the sum now involves both the real and imaginary parts (w r , x r , w i , x i ) of the input biquaternion. This is unlike the case above for M (u) in which the real and imaginary components are independent. Thus it is the hyperbolic rotation M (h) that allows for the interaction between the real and imaginary components. To illustrate the hyperbolic rotation, we set w r = 1, w i = 2, x r = 3, x i = 4, and change the value of φ continually from an initial value of 0. Note that when φ = 0, the first term in the sum is the point (w r , x r ) and the second term is (w i , x i ). As φ changes, we can visualize the projection of that point. In Figure 3, the initial points in red are projected along the green lines. Clearly the green paths are hyperbolic. The blue point is an example of a projected point.

C Normalization of biquaternions
Given that Q × r = (w r + w i I) + (x r + x i I)i + (y r + y i I)j + (z r + z i I)k, let A = (w 2 r + x 2 r + y 2 r + z 2 r ) and B = (w 2 i + x 2 i + y 2 i + z 2 i ), we define the real vector norm Q × r v and biquaternion norm Q × r b as follows: Thus, we can obtain Q × r in section 5.5.2 with the standard normalization of real vectors: To make Q × r be a unit biquaternion, we have to make sure that A − B = 1 and w r w i + x r x i + y r y i + z r z i = 0. We first employ the Gram-Schmidt orthogonalization technique to guarantee that the imaginary coefficient is zero and then restrict B = 1. Alternatively, we represent Q × r as Q × r = q 1 + q 2 I, and conduct the following operations: q 1 = q 1 − < q 1 , q 2 > q 2 2 q 2 q 1 = √ 2q 1 q 1 , q 2 = q 2 q 2 .
Thus, we obtain the unit biquaternion Q × r = q 1 + q 2 I.

D Variance of the performance
In Table 8 and 9, we provide the averages and standard deviations for all metrics on the FB15k-237, WN18RR, YAGO3-10, CN-100K and ATOMIC datasets. The results are reported based on 10 runs with different random initializations. We see that the performance of our model BiQUE is quite stable across different random initializations, and this supports the robustness of our method.