RotateQVS: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion

Temporal factors are tied to the growth of facts in realistic applications, such as the progress of diseases and the development of political situation, therefore, research on Temporal Knowledge Graph (TKG) attracks much attention. In TKG, relation patterns inherent with temporality are required to be studied for representation learning and reasoning across temporal facts. However, existing methods can hardly model temporal relation patterns, nor can capture the intrinsic connections between relations when evolving over time, lacking of interpretability. In this paper, we propose a novel temporal modeling method which represents temporal entities as Rotations in Quaternion Vector Space (RotateQVS) and relations as complex vectors in Hamilton’s quaternion space. We demonstrate our method can model key patterns of relations in TKG, such as symmetry, asymmetry, inverse, and can capture time-evolved relations by theory. And empirically, we show that our method can boost the performance of link prediction tasks over four temporal knowledge graph benchmarks.


Introduction
Knowledge Graphs (KGs) have been widely adopted to represent informative knowledge or facts in real-world applications (Bollacker et al., 2008;Miller, 1995;Suchanek et al., 2007). However, as known facts are usually sparse, KGs are far from completeness. Thus, Knowledge Graph Completion (KGC) methods are proposed to predict missing facts, i.e. links between entities (Bordes et al., 2013;Dettmers et al., 2018;Chen et al., 2021b). Furthermore, in real world, many facts are bonded with a particular time by nature. For example, Barack Obama is the president of USA is only valid for the time period 2009 -2017. To model such time-sensitive facts, Temporal Knowledge Graphs (TKGs) have * Corresponding author. recently drawn growing attention from both academic and industrial communities (Lautenschlager et al., 2015;Leetaru and Schrodt, 2013).
TKG Embedding (TKGE) methods (Jiang et al., 2016;Dasgupta et al., 2018;Jin et al., 2020;Sadeghian et al., 2021) were proposed to represent entities and relations with temporal features in TKGs (Lautenschlager et al., 2015;Leetaru and Schrodt, 2013). But how to present them with temporal interpretability remains a challenge for state-of-the-art TKGE models. Further, it is crucial for TKG Completion (TKGC) to leverage the learned temporal information. Previous static KGC works Schlichtkrull et al., 2018;Gao et al., 2020) learn explainable embeddings of various relation patterns, so that symmetric pattern (e.g. "co-author"), asymmetric pattern (e.g. "affiliation"), inverse pattern (e.g. "buyer" vs. "seller") and complex composition pattern (e.g. "father's wife (mother)" vs. "wife's father (father in law)") can be captured in static KGs. However, in TKGs, there are inherent connections between entities and their relations along with time-evolving. For example, the relation between Kit Harington and Rose Leslie is in love in 2012, becomes engaged in 2017, and then turns into married in 2018. To the best of our knowledge, very few of the existing TKGE methods can capture them.
To address this problem, we take inspirations from Hamilton's quaternion number system (Hamilton, 1844;Zhang et al., 2019a;Gao et al., 2020) and propose a novel method based on quaternion. To be specific, we encode both entities and relations as quaternion embeddings, and then temporal entity embeddings can be represented as Rotations in Quaternion Vector Space (Rotate-QVS). Theoretically, we show the limitations of previous methods and demonstrate that performing quaternion embeddings can model symmetric, asymmetric, and inverse relation patterns. Meanwhile, we prove our method is capable of cap-turing time-evolving information in TKG explicably. We empirically evaluate our method over four TKGC benchmarks and report state-of-theart performance consistently. Further, we perform analysis of the learned quaternion embeddings and show the abilities of our RotateQVS for modeling various relation patterns, including temporal evolution.
We summarize our main contributions as follows: 1. We propose an original quaternion based TKGC method, namely RotateQVS, which represents temporal information as rotations in quaternion vector space.
2. We study temporal evolving relations, and we demonstrate the proposed RotateQVS can model various relation patterns including temporal evolution both theoretically and empirically.

Preliminaries on Hamilton's Quaternions
Quaternion number system (Hamilton, 1844) is an extension of traditional complex numbers. Recently, quaternion has been applied in static knowledge graph embedding (Zhang et al., 2019a;Gao et al., 2020). For readers better understanding our method in Section 3, we introduce the definition and basic operations of quaternion in this section.

Quaternion Operations
A quaternion is expressed as q = a + bi + cj + dk, and some key quaternion operations are defined as: Conjugate Similar to a traditional complex number, the conjugate of a quaternion is defined with the same real part and the opposite imaginary parts, that is q = a − bi − cj − dk .
Inner Product The inner product between q 1 = a 1 + b 1 i + c 1 j + d 1 k and q 2 = a 2 + b 2 i + c 2 j + d 2 k is the sum of product of each corresponding factor Norm With the definition of conjugate and inner product, the norm of a quaternion is defined as: Inverse The inverse of a quaternion is defined from q −1 · q = q · q −1 = 1. Multiplying by q, we have q · q · q −1 = q, derived from which we get: Hamilton Product For two quaternions q 1 and q 2 , their product is determined by the products of the basis elements and the distributive law. The quaternion multiplication formula is: Considering the conjugate of Hamilton product, we can further deduce: q 1 q 2 = q 2 q 1 , q 1 q 2 q 3 = q 3 q 2 q 1 . (4)

3D Vector Space
In fact, the imaginary part bi + cj + dk of a quaternion behaves like a vector v = (b, c, d) in a 3D vector space. Thus, conveniently, we rewrite a quaternion using imaginary vectors: Multiplication rule The multiplication of two imaginary vectors v 1 and v 2 is where v 1 × v 2 is vector cross product, resulting in a vector, and v 1 · v 2 is the dot product, which gives a scalar. Obviously, the multiplication of two imaginary vectors is non-commutative, as the cross product is non-commutative. Thus, the multiplication of two quaternions can be rewritten in 3D vector perspective: In this section, we introduce a novel temporal modeling approach for TKG by representing temporal information as Rotations in Quaternion Vector Space (RotateQVS).

Notations
Suppose that we have a temporal knowledge graph, noted as G. We use E to denote the set of entities, R to denote the set of relations, and T to denote the set of time stamps. Then, the temporal knowledge graph G can be defined as a collection of quadruples, noted as (s, r, o, t), where a relation r ∈ R holds between a head entity s ∈ E and an tail entity o ∈ E at time t. The actual time t is represented by a time stamp τ ∈ T .

Representing Temporal Information using Rotations in 3D Vector Space
Similar to Tero (Xu et al., 2020a) which utilizes a rotation in complex space, we also represent temporal information using rotations while in the quaternion vector space. In 3D vector space, according to Euler's rotation theorem (Euler, 1776;Verhoeff, 2014), any rotation or sequence of rotations of a rigid body or a coordinate system about a fixed point is equivalent to a single rotation by a given angle θ about a fixed axis (called the Euler axis) that runs through the fixed point. And an extension of Euler's formula for quaternion can be expressed as follows: where i, j, k are unit vectors representing the three Cartesian axes.

Representing Time, Entities, and Relations:
Quaternions provide us with a simple way to encode this axis-angle representation in four numbers, and can be used to perform the rotation procedure in 3D vector space. By doing so, we constrain the time stamp embedding τ τ τ as a unit quaternion as τ τ τ = cos where u τ is a unit vector in the quaternion space. And for other elements of a quadruple (s, r, o, t), based on the Hamilton's quaternions in Section 2, we map each of them to its base, which is a timeindependent quaternion embedding: where

Temporal Entities:
We make use of the quaternion rules to represent temporal information as rotations in 3D vector space. An abstract rotation procedure is illustrated in Figure 1.
Theorem 1. Given a unit quaternion q = cos θ 2 + u sin θ 2 , where u ∈ Ri + Rj + Rk is a unit vector (rotation axis) in a three-dimensional space, the result of vector v rotating θ around the rotation axis u is v = qvq −1 = qvq .
Theorem 1 is supported by Rodrigues' rotation formula (Rodrigues, 1840). 1 We then define the functional mapping that reflects the temporal evolution of an entity embedding. For each time stamp τ , the functional mapping is an element-wise rotation from the basic entity embedding e (quaternion representation) to the time-specific entity embedding e t , which is as follows: where a e and v e are the scalar/real and vector/imaginary part of the entity quaternion representation e respectively. And according to Theorem 1, τ τ τ v e τ τ τ −1 is the result of vector v e rotating θ τ around the rotation axis u τ (τ τ τ = cos θτ 2 + u τ sin θτ 2 , see 1 See proof in Appendix A Equation 9) which constitutes the vector/imaginary part of e t . Thus, we can get a lemma: Lemma 1. The vector (imaginary) part is rotated while the scalar (real) part remains unchanged in the functional mapping (Equation 12) which reflects the temporal evolution of an entity embedding.
For a quadruple (s, r, o, t), we make use of the functional mapping to get the time-specific entity embeddings s t and o t from the basic entity embeddings s and o: Considering the temporal evolution of entity embedding, the relation embedding r is regarded as a translation from the time-specific subject embedding s t to the conjugate of the time-specific object embedding o t . In other words, we aim to make s t + r = o t for all positive quadruples. Then, the score function can be defined as: Note that each embedding above is a quaternion representation, and "||" denotes the norm computation (see Equation 1).

Loss Function
We use the same margin loss function with multiple negative sampling as proposed in (Sun et al., 2019), which has been proved to be effective on distancebased KGE models (Bordes et al., 2013;Sun et al., 2019) and as well as the TKGE models (Xu et al., 2019(Xu et al., , 2020a. In details, our loss function is where η is the number of negative training samples over the positive one, ξ is the positive training quadruple, σ(·) denotes the sigmoid function, γ is a fixed margin, and ξ i denotes the i-th negative sample generated by randomly corrupting the subject or the object of ξ such as (s , r, o, t) and (s, r, o , t).

Modeling Various Relation Patterns
In this section, we demonstrate that our RotateQVS can model various relation patterns. In TKGE, four kinds of relation patterns are mostly considered and studied in previous static KGE and TKGE works (Sun et al., 2019;Gao et al., 2020). Their definitions are given as follows: Definition 4. Relation r 1 and r 2 are evolving over Comparing with other TKGE methods, we show RotateQVS can model all these four patterns, while previous methods (see Section 4.3) fail to do so. 2 One advantage of applying quaternion embeddings is that our method supports all these relation patterns, while other representation forms cannot, such as TeRo (Xu et al., 2020a) using complex number system a + bi. 3 As seen in our score function (Equation 14), our aim is to make Then we can get following results: Proof. For temporal-evolution pattern, r 1 (s, o, t 1 ) ∧ r 2 (s, o, t 2 ) in Definition 4 can be expressed as: For the same head entity and tail entity, if a relation r 1 holds at time t 1 (time stamp τ 1 ) and a relation r 2 holds at time t 2 (time stamp τ 2 ), we are supposed to get τ τ τ 2 τ τ τ −1 1 r 1 (τ τ τ 2 τ τ τ −1 1 ) −1 = r 2 . In addition, based on Equation 17, we have Since we have Theorem 1, τ τ τ −1 1 r 1 τ τ τ 1 and τ τ τ −1 2 r 2 τ τ τ 2 can be regarded as rotations in quaternion vector space for r 1 and r 2 , respectively, which indicates the norm of r 1 is the same as that of r 2 . Furthermore, Lemma 1 indicates the rotation mapping keeps the scalar/real part unchanged for a vector. Thus, we can have the following deductions: Notice that Equation 19 is a sufficient and unnecessary conclusion of Equation 18.

Theoretical Comparison Against TeRo
TeRo (Xu et al., 2020a) is the main baseline for our model. The rotated head entity embedding and tail entity embedding of TeRo in complex number system are s • τ τ τ , and o • τ τ τ respectively, where • denotes Hermitian dot product. The translational score function of TeRo f (s, r, o, t) = ||s t +r−o t || is to make And we further prove that TeRo can not model relations with temporal evolution by means of reduction to absurdity. 4 To this end, taking advantages of quaternion representation, our RotateQVS can deduce further derivation: where time stamp embeddings and relation embeddings can be particularly extracted to analyse the influence of temporal evolution on relations, 4 See proof in Appendix F.   (Mahdisoltani et al., 2015), where time annotations are represented as time intervals. We derive the dataset from HyTE (Dasgupta et al., 2018) to obtain the same year-level granularity by dropping the month and date information, which results in 70 different time stamps.
For GDELT, we use the subset extracted by Trivedi et al., consisting of the facts from April 1, 2015 to March 31, 2016. We take the same pretreatment of the train, validation and test sets as (Goel et al., 2020), to make the problem into a TKGC rather than an extrapolation problem.

Evaluation Protocol
Link prediction task that aims to infer incomplete time-wise fact with a missing entity ((s, r, ?, t) or (?, r, o, t)) is adopted to evaluate the proposed model. During the inference, we follow the same procedure of Xu et al. to generate candidates. The performance is reported on the standard evaluation metrics: the proportion of correct triples ranked in top 1, 3 and 10 (Hits@1, Hits@3, and Hits@10), and Mean Reciprocal Rank (MRR). All the metrics (Hits@1, Hits@3, Hits@10 and MRR) are the higher the better. For all experiments, we report averaged results across 5 runs, and we omit the variance as it is generally low.

Baselines
We compare with both sota static and temporal KGE baselines. For static baselines, we use TransE (Bordes et al., 2013), DistMult , RotatE (Sun et al., 2019), and QuatE (Zhang et al., 2019a). For TKGE methods, we consider TTransE (Leblay and Chekol, 2018) (Xu et al., 2020a). 7 Note that TeRo (Xu et al., 2020a) is also based on the idea of rotations, and thus we consider TeRo as a directly baseline. Because our quaternion representation (a+bi+cj+dk) doubles the embedding parameters of TeRo which uses complex representation (a + bi), we further adopt two models for fair comparisons: (i) TeRo-Large: TeRo using dou-7 See complexity comparison in Appendix 3.5. ble embedding dimension; 8 (ii) RotateQVS-Small: The proposed RotateQVS with half embedding dimension. By doing so, their parameter complexities can be comparable with TeRo's.

Results
The experimental results over four TKG datasets are shown in Table 3. 9 Overall, TKGE methods are better than static KGE methods, which shows the effectiveness of modeling temporal information. For the proposed RotateQVS, we observe that our model outperforms all the baseline models over the four datasets across all metrics consistently. 10 To demonstrate the superiority of the proposed quaternion method, we compare our Ro-tateQVS with the direct baseline TeRo (Xu et al., 2020a). For fair comparisons of model sizes, we observe that our RotateQVS outperforms TeRo-Large and RotateQVS-Small outperforms TeRo. This shows our methods with quaternion embeddings makes great improvements, demonstrating our advantages. Specially, we see that our RotateQVS achieves more improvements on ICEWS14 and ICEWS05-15 datasets. We believe this is because these two datasets have much more quantitative relations (see Table 2) and it is also evident our method behaves better on datasets with complex relation patterns.

Analysis and Case Study
To further demonstrate the learned quaternion embeddings and the ability of our model, we perform case studies on multiple relation patterns, through visualization and quantitative analysis on intuitive examples from ICEWS14.

Symmetric/Asymmetric/Inversion Patterns
Since symmetric, asymmetric and inversion patterns have been discussed in previous work (Sun et al., 2019;Xu et al., 2020a), we present the case studies of them to Appendix J.

Temporal-evolution Pattern
As shown in Lemma 5, if a relation r 1 and a relation r 2 are evolving over time from t 1 (time stamp τ 1 ) 8 We reuse the original implementation of (Xu et al., 2020a) from https://github.com/soledad921/ATISE and follow the their best setups. 9 See hyperparameter setup in Appendix G. 10 We also take time granularity analysis and embedding dimension analysis in Appendix H and I.  Table 3: Results on link prediction task over four experimented datasets. The best score is in bold and second best score is underlined.
To analyse the temporal-evolution pattern, we focus on the relations between the same head and tail entities with different time stamps. For example, from ICEWS14, we observe a base fact To illustrate this pattern, we measure the matrix cosine similarity between r 2 (base) and τ τ τ 2 τ τ τ −1 1 r 1 (τ τ τ 2 τ τ τ −1 1 ) −1 (temporal-evolved). For each true fact, we sample a random negative relation and show their similarity difference. Figure 2 illus-   Table 4: Examples of temporal-evolution patterns in ICEWS14 dataset. The similarity score is based on base fact. for temporal-evolution pattern.
our RotateQVS can model temporal-evolution patterns more effectively. Comparing with TeRo (Xu et al., 2020a), which is the main baseline for our model, we show TeRo cannot model this pattern theoretically (see Section 3.4).
In addition, Figure 3 shows our quaternion representation do well in reflecting Equation 19, the sufficient and unnecessary deductions of theoretical analysis for temporal-evolution pattern.

Convergence Analysis
For convergence analysis, we consider two fair comparisons, where the compared two methods have the same number of parameters: 11 Rotate-QVS (blue solid line) vs. TeRo-Large (yellow solid line) and RotateQVS-Small (green dotted line) vs. 11 Refer to Section 4.3 for more details Figure 4: The convergence study of RotateQVS, TeRo-Large, RotateQVS-Small and TeRo by epochs on ICEWS14 test set, and we use the metric MRR here.
TeRo (red dotted line) in Figure 4. We observe that RotateQVS and TeRo-Large converge at approximately the same rate, and so do RotateQVS-Small and TeRo. We can conclude that our proposed Ro-tateQVS can achieve better results in comparisons of both large and small levels without sacrificing additional training efforts.

Related work
Models working on Static Knowledge graph have been well studied (Zhang et al., 2019b;Xu et al., 2020b;Mao et al., 2020;Chen et al., 2021a) with semantic and structure information. Translation based methods, e.g. TransE (Bordes et al., 2013) and TransR (Lin et al., 2015), formalise the factual distance between a head entity s and a tail entity o with the translation carried out by the relation. Adopting tensor factorization with a bilinear transformation, semantic matching models, e.g. RESCAL (Nickel and Tresp, 2013) and Dist-Mult , capture the semantic relevance of entities. Recently, more attention were paid to study various relation patterns. RotatE (Sun et al., 2019) treat each relation as a rotation so that symmetric/asymmetric, inversion and composition patterns can be inferred to predict missing links. Further, quaternion number system (Hamil-ton, 1844) is applied to model more complex composition patterns in 3D space, such as Rotate3D (Gao et al., 2020) and QuatE (Zhang et al., 2019a).
Many aforementioned methods (Dasgupta et al., 2018;Leblay and Chekol, 2018;Trivedi et al., 2017;García-Durán et al., 2018;Goel et al., 2020;Sadeghian et al., 2021) are extended from static Static KGs to TKGs. They integrate time information into previous static methods as independent features. Others study the dynamic evolution of TKG. ATiSE (Xu et al., 2019) regards the temporal evolution of entity and relation embeddings as combinations of trend component, seasonal component and random component. CyGNet (Zhu et al., 2021) proposes a time-aware copy-generation mechanism leveraging known facts in the past to predict unknown facts in the future. TeRo (Xu et al., 2020a) defines the temporal evolution of entity embedding as a rotation in the complex vector space. Inspired by TeRo, our RotateQVS further represents temporal entities as rotations in quaternion vector space and obtains more advantages. 12 Modeling various temporal relation patterns (Goel et al., 2020;Xu et al., 2020a), especially the temporal-evolution patterns, is crucial for TKGE and the following TKGC. Zhang et al. mentions the time-evolution property, but does not make a systematic research on it. It remains an open research question with few researches. Our work (RotateQVS) takes inspirations from the idea of rotation and generalises it into the quaternion number system to model the complex temporal-evolution pattern that TeRo can hardly do.

Conclusion
In this paper, we introduce a novel TKGC method RotateQVS which represents temporal information of TKGs as rotations in quaternion vector space. Targeting temporal interpretability, we theoretically analyse that RotateQVS can model various relation patterns and demonstrate it with extensive experiments. Compared to previous methods, Rotate-QVS has made significant improvements on link prediction tasks over four benchmark datasets. Furthermore, we show our RotateQVS has great advantages in modeling various relation patterns with temporal evolution. 12 Refer to Section 3.4 for more details.
Using trigonometric identities, we can get v = v cos θ + (u × v) sin θ + u(u · v)(1 − cos θ) where v ⊥ = v−u(u·v) and v = u(u·v) are the components of v (perpendicular and parallel to the axis u respectively). Our Equation 25 satisfies the Rodrigues' rotation formula (Rodrigues, 1840) in 3D vector space (illustrated in Figure 5). Therefore, the Equation 11 is proven to be a rotation in 3D vector space.

F Proof by Contradiction for TeRo
Proof. Supposing TeRo (Xu et al., 2020a) can model the temporal-evolution relation pattern (defined in Definition 4), then relations with temporal-