Knowledge Graph Representation Learning using Ordinary Differential Equations

Knowledge Graph Embeddings (KGEs) have shown promising performance on link prediction tasks by mapping the entities and relations from a knowledge graph into a geometric space. The capability of KGEs in preserving graph characteristics including structural aspects and semantics, highly depends on the design of their score function, as well as the inherited abilities from the underlying geometry. Many KGEs use the Euclidean geometry which renders them incapable of preserving complex structures and consequently causes wrong inferences by the models. To address this problem, we propose a neuro differential KGE that embeds nodes of a KG on the trajectories of Ordinary Differential Equations (ODEs). To this end, we represent each relation (edge) in a KG as a vector field on several manifolds. We specifically parameterize ODEs by a neural network to represent complex manifolds and complex vector fields on the manifolds. Therefore, the underlying embedding space is capable to assume the shape of various geometric forms to encode heterogeneous subgraphs. Experiments on synthetic and benchmark datasets using state-of-the-art KGE models justify the ODE trajectories as a means to enable structure preservation and consequently avoiding wrong inferences.


Introduction
Knowledge Graphs (KGs) have a significant impact on machine learning approaches (Wang et al., 2017). A KG usually represents factual knowledge in triples of the form (entity, relation, entity) e.g., (Plato, influences, Kant). The nodes of a KG represent entities and the links denote the relations. Although quantitatively KGs are often large-scale with millions of triples, they are usually incomplete i.e. do not capture all knowledge within a domain of interest. To address this problem, various approaches have been used so far, among which link prediction using KG embeddings (KGE) attracted growing attention. KGE models map entities (e) and relations (r) of a KG from a symbolic domain to a geometric space (e.g. a vector space). Such embedding models employ a score function to perform the link prediction task, which uses the learned embedding vectors (e t , r, e t+1 ) of a triple (e t , r, e t+1 ) to compute its plausibility. This allows to rank triples by their score, where a correct triple should obtain a higher score and a lower rank than an incorrect triple. KGE models have been predominantly studied with a focus on the triple level.
However, in a broader view, triples of a graph form specific subgraphs (i.e. structural patterns distributed in the graph) in local structures. The preservation of subgraphs in the learned representations is a major challenge. In this regard, the choice of geometry becomes crucial as the distribution of the corresponding embeddings for entities and relations depends on it. Furthermore, within each geometry, the mathematical operations used in the score function lead to differences in the encoding capability of KGEs. Several stateof-the-art KGE models (Zhang et al., 2019a;Sun et al., 2019;Trouillon et al., 2016;Bordes et   2013a) are designed in Euclidean geometry which do not intrinsically support structural preservation.
As an example, consider Figure 1, which contains both a loop/cycle and a path (without cycles) for the "influences" relation. Many KGE models incorrectly predict the path to be closed in such a scenario. The root cause of this problem lies in the entity-dependent nature of the "influences" relation. Most of the KGE models such as RotatE (as well as TransE, ComplEx, QuatE) consider relations independent of entities. This problem can usually only be mitigated at the cost of increasing the dimensionality of the model, which leads to higher computational costs and may negatively impact the usefulness of learned representations in downstream machine learning tasks. For those KGEs with sophisticated geometry, only a limited number of structures such as hierarchical or tree-like have been studied by using hyperbolic geometry or Poincaré Ball models (Nickel and Kiela, 2017). In order to significantly improve the structure preservation capabilities of KGE models, we propose a novel KGE model named FieldE which employs differential equations (DEs) for embedding KGs into a vector space. The use of differential equations allows to overcome the entityspecific nature of previous KGE models. In this model, relations are viewed as trajectories connecting neighboring nodes in the graph, which implies a continuity of changes in the embedding space and consequently describes the underlying geometry. This is especially important, because the success of a KGE model depends on the way it correctly specifies the underlying geometry that describes the natural proximity of data samples in a geometric space (Mathieu and Nickel, 2020). Designing FieldE with a list of well-specified geometries (Euclidean, Poincare Ball, Hyperboloid, and Spherical) a) improves generalization and b) increases the interpretability. This is due to capturing the natural proximity of entities from KG space to the geometric space. We employ First-order Ordinary Differential equations, which are a special class of DEs that represent a vector field on a smooth Riemannian manifold. We selected first-order ODEs, because of their advantages over other classes of DEs: while being capable of capturing complex geometries, they are also efficiently implementable. To allow our approach to self-adapt to the complexity of the underlying knowledge graph, we developed a neural network based approach which learns a suitable geometry from the training graph itself. Therefore, FieldE combines substantial previous research and insights of DEs, embeddings and neural networks in order to provide a comprehensive model capable of representation learning on KGs with multiple heterogeneous subgraphs.

Related Work
We describe the KGE models exploiting geometric properties for structure preservation. We also review ODEs in ML in general, as our model is, to the best of our knowledge, the first approach employing ODEs in KGEs. KGEs in Euclidean Geometry. Apart from some discussions about encoding relational patterns, capturing structures is not directly targeted by most previous work in KGE models. TransE (Bordes et al., 2013b) discussed simple 1-1 relational patterns and its follow up models (Ji et al., 2015;Lin et al., 2015;Wang et al., 2014) considered 1-many, many-1, and many-many patterns. RotatE (Sun et al., 2019) uses rotational transformations for encoding more complex patterns such as symmetry, transitivity, composition, reflexivity and inversion. Some other KGEs such as DisMult (Yang et al., 2015), ComplEx (Trouillon et al., 2016), only focus on effective choices of embedding by element-wise multiplication of transformed head and tail or angle transformation in QuatE (Zhang et al., 2019b), andRESCAL (Nickel et al., 2011), KGEs in Non-Euclidean Geometry. Several works such as MuRP, ROTH, REFH, ATTH (Balazevic et al., 2019;Chami et al., 2020) use non-Euclidean geometry for hierarchical structures only (Suzuki et al., 2018;Ji et al., 2016). In (Dareddy et al., 2019), random walk and heterogeneous skip-gram models are used to generate the embeddings of different structures without leveraging DEs. Recently, (Lou et al., 2020) used Fréchet mean and geometries for capturing structures but not for link prediction task.
Use of Differential Equations in Machine Learning. An early work that uses differential equations in graphs is (Kozen and Stefansson, 1997), however it does not employ neural networks. In (Chen et al., 2018), a family of deep neural network models has been proposed which parameterizes the derivative of a hidden state instead of the usual specification of a discrete sequence of hidden layers. In this approach, ODEs are used in the design of continuous-depth networks with the purpose of providing an efficient computation of the network output, which improves memory efficiency, adaptive computation, parameter efficiency (Kobyzev et al., 2020), and continuous timeseries models (Kidger et al., 2020). It is applied for supervised learning on an image dataset and timeseries prediction. This work used ODEs in the proposed approach without considering knowledge graphs and embeddings for link prediction.

Preliminaries and Background
This section provides the preliminaries of the Riemannian Geometry (Franciscus, 2011;Hairer, 2011) and explains its key elements required to understand our model. The aim of our model is to embed nodes (entities) of a KG on trajectories of vector fields (relations) laid on the surface of a smooth Riemannian manifold. Therefore, we first provide the mathematical definitions for manifold, tangent space and vector field driving the dynamics of a DE on the manifold.

Manifold and Tangent Space
We denote by M a smooth manifold of dimension d that is embedded in a higher dimensional Euclidean space R n with n ≥ d. Consider a particle moving on M. The set of all possible directions that a particle passing a given point may go forms a space called the tangent space (the set of velocities). Formally, given a point p on a manifold M, the tangent space T p M ⊂ R n is the set of all the vectors which are tangent to all the continuously differentiable curves passing through point p. A Tangent Bundle is the set of all tangent spaces on a manifold M and is defined as If f is continuously differentiable, then for each initial condition and a unique trajectory γ : [a, b] → M solving (1) which is twice differentiable (Hairer, 2011).
Riemannian Manifold A smooth manifold M endowed with a Riemannian metric g is a Riemannian manifold, denoted by (M, g). For p ∈ M, the function g p = g(p) = ., . p : T p M × T p M − → R defines an inner product on the associated tangent space. The metric tensor is used to measure angle, length of curves, surface area and volume locally. Global quantities can then be derived as integrals over the local contributions.
Geodesics and Exponential Map The manifold equivalent of the notion of straight lines in Euclidean space is given by geodesics and can be defined in terms of the metric tensor g. Geodesics are curves on the manifold such that given any two (sufficiently close) points on this curve, the geodesic minimizes the length of all curves joining these two points. For each pair (p, v) with p ∈ M, v ∈ T p M, there exists a unique geodesic γ(t) ∈ M such that γ(0) = p and v = dγ(t) dt t=0 . The exponential map exp p is then defined as exp p :

Method
In this section, we propose FieldE, a new KGE model based on Ordinary Differential equations (ODEs). The approach relies on the following two key components: 1) choice of a smooth Riemannian manifold M on which the embedded entities lie; 2) choice of a vector field f such that a given relation between entities in the KG is encoded as trajectories on M solving the ODE in Equation (2) and connecting embedded entities that are related in the KG. These components can either be given explicitly, or be learned directly from the data. Table 1 includes a description of the manifolds we used in the application of FieldE. Below, we present the formulation of FieldE in five steps: relation formulation, entity representation, triple learning, plausibility measurement, vector field parameterization, and manifold specification which are discussed in the remainder of this section.  Relation Formulation FieldE represents each relation r in a KG as a vector field (f θr ) on a Riemannian manifold. Here, we presume a given functional form for the vector field f θr (independent of time), which is determined by the choice of parameters θ r . Let e(t) be a parametric trajectory that evolves in time, t ∈ R, solving the following ODE corresponding to relation r of the KG: Given the above formulation, each relation of a KG corresponds to a relation-specific vector field. This is consistent with the nature of KGs where different relations form different structures and patterns.

Entity Representation
We represent each entity e i in the KG by a vector in R n denoted by e(t), matching each subscript i to a time t where e(t) lies on a trajectory on the manifold M solving the ODE (2). In particular, consider k + 1 entities e in , e i n+1 , . . . , e i n+k in the KG, each connected to the next by a relation r. The corresponding embeddings are then discrete points e(t n ), e(t n+1 ), . . . , e(t n+k ) ∈ M lying on a trajectory e(t) ∈ M solving the ODE in Equation (2).
Triple Learning Let e in and e i n+1 be two subsequent nodes (e.g. the entities shown in the upper part of the Figure 2) of a graph connected by a relation r. This means the triple (e in , r, e i n+1 ) is present in the KG. Let e(t n ), e(t n+1 ) ∈ M be the embeddings corresponding to the entities e in , e i n+1 respectively (lower part of Figure 2). We then represent a triple (e in , r, e i n+1 ) as a transition from head entity embedding e(t n ) to tail embedding e(t n+1 ) on a relation-specific vector field over the manifold. Therefore, in order to encode a triple (e in , r, e i n+1 ) on the manifold, we first compute the tangent vector at e tn = e(t n ), i.e. v r et n = de(tn) dt = f θr (e tn ), which is the direction of movement at point e tn towards the point e t n+1 = e(t n+1 ). We then use the exponential map to map the tangent vector at the head embedding to the tail embedding (determining the direction of movement and moving towards the tail embedding to meet the tail on the manifold) as follows: The triple (e in , r, e i n+1 ) is negative if it does not appear in the KG. To encode negative triples, the following inequality should be satisfied: Equation 4 indicates that the triple is measured negative if moving in the direction of the tangent vector at the head point along the manifold does not coincide with the tail.
Plausibility Measurement Given a triple (e in , r, e i n+1 ) in the KG, the plausibility of the triple is measured by computing the distance of the corresponding vectors e t n+1 and . We denote the plausibility measure for a given relation r by the score function S r , and consider two different choices: 1) the distance-based version named DFieldE: where dist is a suitable distance function on the manifold (see Table 1) , and 2) the semanticmatching version named SFieldE: with ·, · denoting the Euclidean inner product.
Vector Field Parameterization The selection of the function f θr is key to our KGE approach. In this paper, we propose two approaches for determining the vector field: a) we parameterize the vector field function f θr by a neural network (NN) and propose a neuro-differential KGE model, b) we consider a vector field given by a linear function, resulting in a linear version of our KGE model. Next, we explain these two choices in detail.

Neuro-FieldE
We parameterize the vector field by a multi-layer feedforward NN to approximate the underlying vector field, where e ∈ M, L is the number of hidden layers, w o denotes the output weight of the network and w l mn is the weight connecting the mth node of the layer l − 1 to the nth node of the l-th layer (see Figure 2). All weights are collected in the vector of parameters θ r which is learned during training. Parametrizing the vector field with an NN gives the model enough flexibility to learn various shapes of the manifold dynamics encoded in the vector field f θr (representing complex geometry) from data. This is due to the fact that NNs are universal approximators (Hornik et al., 1989;Hornik, 1991;Nayyeri et al., 2017), i.e. NNs are capable of approximating any continuous function.
Linear-FieldE Linear ODEs are a class of differential equations which have been widely used for several applications (Massera and Schäffer, 1966). Here we model the vector field as a linear function for e ∈ M, where A r is an n × n matrix representing a projection on the tangent space. Depending on the eigenvalues of A r , the vector field can have various shapes.
Manifold Specification There has been a surge of efforts to appropriately select the underlying manifold for KGE models (Nickel and Kiela, 2017; Balazevic et al., 2019;Chami et al., 2020). However, the selection of a suitable manifold for representation learning still remains challenging because real world KGs contain heterogeneous multirelational neighboring substructures. Thanks to the way FieldE is formulated, it lends itself well to applying techniques from manifold learning in order to explicitly identify the implicit geometry of the KG -a promising direction of future research. Here, we examine FieldE with the following choices of manifolds: the Euclidean space, the unit sphere, Hyperboloid and Poincare ball (see Table 1).

Model Analysis
In this part, we analyse the characteristics of the core formulation of FieldE compared to other models. We first show that while other models such as RotatE, and ComplEx face issues when learning relatively simple single-relational structures, FieldE is able to overcome those issues. Moreover, we show that FieldE subsumes popular KGEs and consequently inherits their capabilities in learning various well-studied patterns e.g. symmetry/inversion.
Flexible Relation Embedding Most of the stateof-the-art KGEs such as TransE, RotatE, QuatE, ComplEx etc., consider each relation of the KG as a constant vector to perform an algebraic operation such as translation or rotation. Therefore, the relation is entity-independent with regards to the applied algebraic operation. Table 2 shows the constraints in the vector space enforced by TransE, RotatE and FieldE for encoding a loop with three nodes i.e. (e 1 , r, e 2 ), (e 2 , r, e 3 ), (e 3 , r, e 1 ). The constraints obtained by TransE and RotatE are independent of the involved entities. In TransE, the relation embedding becomes null (in Euclidean geometry), which implies that the embeddings of all involved entities are equal. Using RotatE, the embeddings of entities in a loop are the same. However, the computed embedding for a relation with a loop will be entity-independent. Therefore, in one dimension of RotatE, all substructures formed by different groups of entities should be the same in terms of density and structure (e.g. all entities should form loops with density of 3 here in our example). This problem can be mitigated by increasing the dimension, however, it is not fully solved when restricting to low dimensional embeddings. FieldE addresses the mentioned problem. The relation-specific constraint (see Table 2) is entitydependent and the direction of translation is determined not only based on the relation, but also based on the entities connected by the relation and the way in which they are mapped to the manifold M. In contrast to RotateE, which can only represent loops with a fixed number of entities, FieldE can capture different substructures locally (such as loops of different sizes, or loops and paths). Note that the relation-specific constraint for Neuro-DFieldE (Equation 2) can always be satisfied because NNs with bounded continuous activation functions are universal approximators and universal classifiers (Hornik et al., 1989;Hornik, 1991;Nayyeri et al., 2017) (complete proof in appendix).
In summary, the state-of-the-art KGE models like TransE, RotatE, ComplEx and QuatE are not capable of preserving more complex structures in the embedding space, because they always model the initial direction of the relation-specific movements independent of the involved entities.

Subsumption of Existing Models
We show (see appendix) that FieldE subsumes popular models: Definition 5.1 (from (Kazemi and Poole, 2018)). A model M 1 subsumes a model M 2 when any scoring over triples of a KG measured by model M 2 can also be obtained by model M 1 .
Because FieldE subsumes existing models, it consequently inherits their advantages in learning various patterns including symmetry, and antisymmetry, transitivity, inversion and composition. Moreover, because ComplEx is fully expressive (as defined in (Kazemi and Poole, 2018)) and it is subsumed by SFieldE, we conclude that Neuro-SFieldE is also fully expressive.

EXPERIMENTS AND RESULTS
In this section, we compare FieldE 1 against TransE, RotatE, ComplEx, QuatE, Dismult, MuRP, ATTH, ROTH, and REFH as those performed best on the presented benchmarks. The experiments are done over four benchmark datasets namely FB15K-237, WN18RR, YAGO3-10, and YouTube. We presented two versions of FieldE namely DFieldE and SFieldE. DFieldE uses the distance function ( (5)) for score computation whereas SFieldE uses the inner product (see Equation (6)). The comparisons are performed in low and high dimensions. We also considered the evaluations with and without data augmentation (adding reverse triples).

Evaluation Metrics
We use the standard metrics for link prediction: Mean Reciprocal Rank (MRR), and Hits@n (n = 1, 3, 10). MRR is measured by nt j=1 1 r j , where r j is the rank of the j-th test triple and n t is the number of triples in the test set. H@n is the number of test triples ranked less than n.

Hyperparameter
Search We employed Adam/Adagrad as the optimizers and tune the hyperparameters based on a validation set. The learning rate (r) and batch size (b) are adjusted on r = {0.0002, 0.002, 0.02, 0, 1}, b = {100, 512, 1024} respectively. The embedding dimension d is fixed to 100 for YAGO3-10, 1000 for FB15k-237, and 300 for YouTube. For experiments on high dimensions (Table 3), we used adversarial negative sampling on all the models, with 100 negative samples in FB15K-237, 500 for YAGO3-10, 300 for YouTube. In addition, the experiments are done on low dimension of 32 for all of these datasets as well as WN18RR where we used bias in the score of DFieldE with the setting introduced in (Chami et al., 2020). To have a fair comparison, all the results in Table 4 have been regenerated using same setting. In these two tables, a NN (Equation 7) approximates the vector field. In Table 5, we use a linear function (equation 8).
Performance Evaluation. As shown in Table 3, in all datasets, FieldE outperforms all other models across all metrics. In all other cases, a consistent performance advantage can be observed, i.e. it appears that FieldE performs well across several datasets whereas other models show larger variations in performance. This evaluation is done on a setting without special boosting techniques except the above described hyperparameter search and using the adversarial loss (Sun et al., 2019) for all evaluated models (including DFieldE with Neural Network as a vector field, explained more in the appendix). Table 4 shows the evaluation of all models and all datasets using Poincaré manifold (similar results have been achieved by Hyperbloid). In this setting, DFieldE outperforms other models using similar manifolds (e.g., MuRP and ATTH). The experiments validated the suitability of Spherical for FB15k-237 and Euclidean manifold for YAGO3-10, and YouTube datasets (reported in Table 3), and Poincaré and Hyperboloid For WN18RR.
Visualization of Vector Fields. In Figure 3, we illustrate the learned vector fields by DFieldE for two structures on different manifolds: circular on Sphere and hierarchical on Hyperboloid. The arrows depict vector fields where each arrow at a point represents the direction of movement to the next point. For the relation celebrities...f riend, the vector field on the sphere is circular, enabling to capture the loop structure. On the other hand, the arrows of the vector fields for the relation partof on the Hyperboloid start from the narrow part of the manifold and move towards its wider sidesthis is suitable for capturing tree-like or hierarchical structures. The opacity relates to the size of the arrows and means the distance between points on the narrow side is less than the distance on the wider side. This is consistent for hierarchical structures where the distance between the nodes grows by moving from a root entity towards the leafs.
In Figure 4, we provide two sample visualizations for the learned vector fields of the influences Evaluation with Data Augmentation. Table 5 shows the performance of our model in a boosted setting (Lacroix et al., 2018) using a full multiclass log-softmax loss function with applied N3 regularization and reciprocal (data augmentation) approaches. This setting is only suitable for semanticmatching models such as ComplEx and QuatE, due to the fast implementation of the matrix-vector product. Such feature enables taking the advantage of using full negative samples in the learning process. Our model significantly outperforms all the other models in all of the metrics on the YouTube dataset. For example, in H@3, the difference in performance is near 10%. On the YAGO3-10 dataset with these boosting techniques, SFieldE outperforms DisMult, QuatE and ComplEx with slightly better results. We believe that the performance differences correlate with the complexity of data structures: The YouTube and YAGO3-10 datasets have a similar size (1 million triples), but YouTube has   fewer entities and relations. Therefore, YAGO3-10 is sparser than YouTube and generally contains less complex graph structures. The performance advantage of FieldE appears to increase when the underlying graph has higher density and contains more complex structures. We could also observe that increasing the number of hidden nodes for YouTube, leads to gradually improving results. This means the required complexity of the underlying vector field could necessitate a higher complexity of the underlying NN. This is reinforced when looking at the simpler YAGO3-10 results, where the best results are achieved with only 5 hidden nodes and any further increase leads to notable overfitting. In conclusion, we believe that complex graphs necessitate complex geometries. While this is not entirely surprising, this hypothesis could be directly investigated empirically as FieldE can vary the complexity using the underlying NNs.

Run-time Evaluation
In terms of runtime per batch, TransE, ComplEx and RotatE have completed the tasks within 0.10 s, 0.11 s, and 0.13 s respectively. FieldE performs learning within 0.14 s which is mostly due to the time required for learning the underlying vector fields. FieldE is capable of learning complex geometries whereas TransE, ComplEx and RotatE represent simple geometry.

Conclusion
We presented FieldE -the first representation learning model for knowledge graphs based on Ordinary Differential Equations. In contrast to previous models, FieldE models relations as vector fields on a Riemannian Manifold and thereby overcomes the drawbacks of previous works, in which relations are entity-independent. Furthermore, we developed a neural network based approach allowing to learn a suitable geometry from the training graph. We have both empirically and analytically shown that FieldE can preserve subgraph structures in the embedding space better than state-of-the-art models. Formally, FieldE is a generalisation of several state-of-the-art KGE models and we could formally show that it subsumes TransE, RotatE, Com-plEx and QuatE. Evaluation on standard benchmark datasets shows competitive or superior performance of FieldE across all datasets and metrics.

A Problem Statement
In this part, we use an example to explore the problem of KGE models in preservation of heterogeneous structures of the underlying KGs. In order to do so, let us focus on a scenario where heterogeneous structures such as loops and paths are created by an individual relation. The scenario is explored with RotatE which is a recent state-of-theart among KGE models (Sun et al., 2019). Despite the high performance of RotatE in comparison to other models, this exploration justifies that it returns wrong inferences for link prediction when loops and paths appear in a subgraph with the same relation. To show this, let us represent the loop and path structures as s L (L = loop) and s P (P = path)each with a set of 10 connected nodes (e 1 . . . e 10 ) with one relation (r). Therefore, (e 1 s L . . . e 10 s L ) represents the nodes of structure (s L ) where entities form a loop, and (e 1 s P . . . e 10 s P ) corresponds to the nodes of another structure with the same relation but forming a path. This is illustrated by an example in Figure 5 where the influences relation creates such loops and paths. For each triple in this graph e.g. (e i s k , r, e j s k ), k ∈ {L, P }, the vector representation using RotatE is e i s k • r = e j s k (• is the multiplication between complex numbers). The complete representations of the loop and path are then the following: s L , r, e 2 s L ), → e 1 s L • r = e 2 s L , . . .

(e 9
s P , r, e 10 s P ). → e 9 s P • r = e 10 s P .
In order to compute the embedding for the r relation, let us first take the loop structure. Starting from e 2 in the right side of the first triple, the equivalent vector (e 1 s L • r) is replaced in its subsequent notation (e.g., by replacing the left side of e 2 in the second triple equation of s L , we get e 1 s L • r • r = e 3 s L ).  s L e 1 s L • r · · · • r = e 10 s L , e 10 s L • r = e 1 s L . e 1 s L • r · · · • r = e 1 s L .
By doing this to the end, we conclude that e 1 s L • r · · · • r = e 1 s L which means r • · · · • r = 1 where r is a complex number (r = e iθr ), therefore θ r = 2π 10 . This value is static for r in the whole graph, therefore, this can be used to check whether the second structure (path) is preserved.
Here, we replace the vectors the same way as above and additionally include the value of r derived in the first calculation. s P e 1 s P • r · · · • r = e 10 s P , After some derivations, we have e 1 s P • e 20πi 10 = e 10 s P • r. With a simplification step, this results in e 1 s P = e 10 s P •r, from which the model infers that the triple (e 10 s P , r, e 1 s P ) is positive. However, this is a path structure and the wrong inference yields it to be a loop structure. This shows how such heterogeneous structures are challenging for the Ro-tatE model. This problem is not limited to RotatE. Other rotation based KGE models such as QuatE and ComplEx also have this problem. Generally, every KGE model which uses constant relationbased transformation such as TransE also suffer from such limitations. The heat map in Figure 5 also indicates the wrong inferences by the state-ofthe-art models namely TransE, RotatE, ComplEx, and QuatE. These wrong inferences lead to difficulties in preservation of loop and path structures. However, FieldE is capable of correct inferences for different sub-structures, thus, heterogeneous structure preservation is also satisfied. Later, we will discuss KGEs by comparing the TransE model with FieldE from the flexibility point of view for relation transformation (constant vs varied vector field).

B Flexible Relation Embedding
Here, we analyse TransE for modeling the mentioned subgraphs. We specifically focus on the loop structure in this part for TransE. The model considers a relation as a constant vector to perform translations as Therefore, a relation-specific transformation (here translation) is performed in the same direction with the same length, regardless of different entities. This causes an issue on the learning outcome of complex structures and patterns. To show this, without loss of generality, let us consider a loop in a graph with a relation r which connects three entities e 1 s L + r = e s L 2 , e s L 2 + r = e s L 3 , e s L 3 + r = e s L 1 . (10) After substituting the first line in Equation 10 in the second one and comparing the result with the third equation, we conclude that r = 0. This is indeed problematic because embedding of all the entities will be the same i.e. different entities are not distinguishable in the geometric space. Now we prove that our model can encode loops, overcoming these issues. In order to learn the loop mentioned above, FieldE should fulfill the following equations In FieldE, after substituting the first line of this equation in the second one, and again substituting the result in the third equation, we obtain the following The above equation can be satisfied by FieldE because neural networks with bounded continuous activation functions (here tangent hyperbolic function) are universal approximator and universal classifiers (Hornik et al., 1989;Hornik, 1991;Nayyeri et al., 2017). Therefore, a well-specified neural network for the vector field f θr can ensure that equation (12) is satisfied.
We additionally show that our model can also embed a path structure with other three entities e s P 1 , e s P 2 , e s P 3 while preserving a loop structure with e s L 1 , e s L 2 , e s L 3 .
e 1 s P + f θr (e s P 1 ) = e s P 2 , e s P 2 + f θr (e s P 2 ) = e s P 3 , e s P 3 + f θr (e s P 3 ) = e s P 1 .
After substituting the first line in this equation in the second equation, and again substituting the results in the third equation, we have f θr (e s P 1 ) + f θr (e s P 2 ) + f θr (e s P 3 ) = 0. (14) Because the embeddings of e s L 1 , e s L 2 , e s L 3 , and e s P 1 , e s P 2 , e s P 3 are distinct points, there is a neural network that approximates the vector field f θr in such a way that both Equations (12) and (14) are satisfied due to the universal approximation ability of the underlying network. Therefore, FieldE can learn two different sub-graph structures with the same relation.

C Subsumption
Here we show that variants of FieldE subsume other KGE models. Proposition 2. DFieldE subsumes TransE and Ro-tatE. SFieldE subsumes ComplEx and QuatE.
Proof. Here we prove that DFieldE subsumes TransE. Note that TransE and RotateE use distance for calculation of scores. Choosing as the manifold M the Euclidean space R d , we have for any x ∈ R d that T x M = M, and the exponential map is given by exp x (v) = x + v. Then, the FieldE assumption is e t+1 = e t + f θr (e t ).
If we set f θr = r (constant vector field), then we have e t+1 = e t + r which is the assumption of the TransE model for triple learning.
Proof. We now prove that DFieldE subsumes Ro-tatE. The assumption in RotatE is where entities and relations are complex vectors and the modulus of the complex coefficients of each relation is 1 i.e. |r| = 1. In the vector form, Equation (15) can be written in real (rotation) matrixvector multiplication as following where R r is a rotation matrix and e v t denotes the vector representation of complex numbers (with two components of real and imaginary). Given the assumption of DFieldE in Euclidean space i.e. e v t+1 = e v t + f θr (e v t ), and setting f θr (e v t ) = (R r − I)e v t where I is the identity matrix, the assumption of RotatE is obtained. We conclude that the RotatE model is a special case of DFieldE.
Proof. Here we present the proof of subsumption of the ComplEx model. With the manifold given by Euclidean space, the SFieldE uses the following score function whereē denotes the complex conjugate of e. We represent the above equation in vectored version of complex numbers as following We can see if f θr (e t ) = α r (R r − 1 αr I)e v t in Equation (16), we obtain the score of the ComplEx model in the vectorized form shown in Equation 18. Therefore, ComplEx is also a special case of SFieldE.
Proof. Here, we show that SFieldE subsumes QuatE. QuatE uses the following formula for the score function where ⊗ and · denote the Hamilton product and element-wise dot product between two quaternion vectors, respectively. Similar to RotatE,Equation (19) can be written in matrix vector multiplication shown in the following equation where R r is a 4d×4d matrix and e v t is a vectorized version of quaternion numbers. Indeed, the above Figure 6: Handling one-to-many relations. Movement on the tangent space will be determined based on the head as source of movement and the tail as target. equation can be constructed by the score function of SFieldE which is given by Equation (16). This will be the same as the score function of QuatE in vectorized form, if the vector Field is set to f θr (e v t ) = (R r − I)e v t . Therefore, SFieldE subsumes the QuatE model, as well.

D Modeling Studied Relational Patterns by FieldE
We have theoretically shown that FieldE subsumes other state-of-the-art KGE models. Therefore, it inherits their capability to encode well-studied relational patterns (such as symmetry, anti-symmetry, inversion etc.). Here, we show the approach for encoding one-to-many relations by the FieldE model. The meaning of one-to-many relation is when an entity, say e t is connected to several entities, e 1 t+1 , e 2 t+1 , . . . , e N t+1 . The score of a triple is computed by dist(e t n+1 , exp et n (v r et n )) which is upperbounded by using the loss function (Sun et al., 2019), i.e. dist(e t n+1 , exp et n (v r et n )) ≤ η 1 . Because smooth Riemanian manifolds are locally Euclidean, there will be an area (with the center of e t ) on the manifold where the tails are embedded in that area and all the corresponding triples (e t , r, e 1 t+1 ), (e t , r, e 2 t+1 ), . . . , (e t , r, e N t+1 ) are measured as positive. This is the way how FieldE handles one-to-many relations.
Another way to handle one-to-many relations is to obtain tangent vectors considering tail to encode a triple in the vector space i.e. In this way, as shown in Figure 6, the direction of the movement on the tangent space will be determined based on the head as source of movement and the tail as target of movement which conse-quently enables the model to handle one-to-many relations.

E Training and the Algorithm of FieldE
In order to optimize the parameters of the FieldE model (θ r and embedding vectors), we employ the loss function used in RotatE (Sun et al., 2019), which is defined as p(e t , r, e t+1 ) log σ(Sr(e t , e t+1 ) − η) , where σ(.) is the Sigmoid function, T, T are two distinct sets of positive and negative samples respectively, and η is the hyper-parameter of the loss and is adjusted through the validation process. Further, p(e t , r, e t+1 ) = exp(αS r (e t , e t+1 )) exp(αS r (e t , e t+1 )) denotes the probability of the triple (e t , r, e t+1 ) to be true negative, and the constant α is the temperature of sampling. Note that a negative sample (e t , r, e t+1 ) is created from a positive sample (e t , r, e t+1 ) by randomly corrupting either e t or e t+1 .
Datasets We run our experiments on several public datasets with diversity in the covered content and graph structure, namely FB15k-237 (Toutanova and Chen, Table 9. • FB15k-237 contains a subset of FreeBase dataset (Bollacker et al., 2008) in the form of a standard KG. It is created for experimental purposes, and covers general world knowledge for example science, politic, and sport (Dettmers et al., 2018). FB15k-237 is a refined subset of FB15k (Toutanova and Chen, 2015) in which most of the triples involved in inverse relational patterns are removed from the training set. • WikiMovie-300k contains extracted knowledge about films such as directors, actors and genre from Wikidata (Vrandečić and Krötzsch, 2014). This dataset contains 300K triples only with entities that appear in at least two triples.
• YAGO3-10 is a subset of information collected from multilingual Wikipedias including: English, German, French, Dutch, Italian, Spanish, Romanian, Polish, Arabic, and Farsi. The knowledge is general such as people influencing each other, cities and airports of them connected to each other, or organizations etc.
• YouTube is a social network dataset which contains information about interactions between 15k users. This information is captured by five relations namely contact, shared friends, shared subscription, shared subscriber, and shared favorite videos between users.
• WN18RR is a subset of WordNet dataset where the inverse relations are deleted, and the main relation patterns are symmetry/antisymmetry and composition.
Specific Test Sets from FB15k-237 In order to further analyse the model, we generated four test sets considering the characteristics of the relations. In addition, we explored the relations to choose the most suitable metric for analysis as depending on the semantics of the relations, the common metrics used for comparison in KGE models such as Hits@1 is possible not to be the best criteria for performance judgement of a model.
• Dataset1: This test set contains the relation of /people/person/prof ession which appears 1311 times in the main data, and is chosen for Right Rank Hits@k. This relation is a one-to-many. Based on the statistics, the lowest number of profession for one person is one and the highest number of professions is recorded to be 13. Therefore, the results for evaluations on this dataset will be shown on Hits@1,3,10.
• Dataset2: This test set corresponds to the relations that create triples with only one possible option in the tail. Example of such relations is FB15k-237 is /common/ . . . /category which appears 402 times in the train dataset. Therefore, for such relations, the evaluation using the Hits@k metric is not appropriate. We provided the results using the F-Measure metric.
• Dataset3: This test set contains those relations that can have limited options in the tail but not only one. An example of such relations is /people/person/gender which has two possibilities in the tail per each head. It covers 436 triples in the dataset. Considering such relations in the evaluation of Hits@k for the models reduces the fairness of the comparisons as it is not the mistake of the models ranking triples for genders high. This test set has 8 more relations of similar kind and we provide the F-Measure for these as well.
• Dataset4: This test set includes two relations with multiple options and always more than 10 possibilities for the tails. Therefore, here we only consider the Hits@10 of right rank for the evaluations. By averaging the left and right ranks in computation of Hits@k for such relations, the performance of the models was not properly measured. An example of such relations is /f ilm/ . . . /f ilm_crew_role and appears in 606 triples which is highly effecting the performance of the models if measured on Hits@1,3 and also taking the left rank in account.
and SFieldE as S r (e tn , e t n+1 ) = e t n+1 , exp et n (v r et n ) , (24) with ·, · denoting the Euclidean inner product in the ambient space.
Each of the above versions of FieldE can either use a neural network to approximate the vector field, or use an explicit linear function as a vector field. For the Neural version of FieldE, we used a neural network with two hidden layers. The linear version of FieldE depends on the same parameters except for the hidden layers and having full negative samples with N3 regularization (Lacroix et al., 2018).
For the Neural version of FieldE, we used a neural network with two hidden layers (details in Table  10). with (500,100) hidden nodes for YAGO3-10, (100,100) for FB15K-237a as well as YouTube, and (5,5) for WikiMovie-300k. We fixed the parameter η to 0.5 in equations 5 and 6.
For SFieldE, all the following parameters apply except the hidden layers and having full negative samples. For all other models, the same parameters were used. We also performed evaluations in low dimensions where we set the same hypermarameters reported in (Chami et al., 2020) for all the models.
Dataset Complexity and Size of Neural Network. An interesting insight from the performance  results and the characteristics of the datasets was the connection between the complexity of knowledge graphs and the complexity in design of the FieldE's Neural Network (i.e. the neural network which parameterizes the vector field). As shown in Table 9, the statistics of the datasets also reveals their sparsity and density. For example, YouTube with 5 relations, 2k entities and 1m triples is a more dense and complex knowledge graph than WikiMovie-300k with 588 relations, 36K entities and 240K triples. In other words, the WikiMovie-300k dataset is much sparser than the YouTube dataset. Considering these characteristics, we explored the performance of FieldE in Hits@1,3,10 and MRR by increasing the number of hidden layers in the Neural Network characterising the vector field f θr . In Figure 7, we show the results of comparison for YouTube and WikiMovie-300K, where increasing the number of hidden nodes on the YouTube dataset, the results are gradually improved but not for WikiMovie-300K. This means the complexity of the underlying vector field re- quired the corresponding neural network to be complex as well. In the case of WikiMovie-300k dataset, the best results are always achieved by only using 5 nodes in each hidden layers, and further increasing the hidden nodes resulted in a decrease of performance in all of the metrics. This clearly leads to overfitting of the model for such sparse datasets. The results presented in Table 8  Dataset4, each of which is analysed with a suitable metric (either Hits@10 right rank or F-Measure). These evaluations have been done using different embedding dimensions, a high dimension of 500 and a low dimension of 32 and the results are shown in Figure 8. As can be seen, the difference of various manifolds choices in high dimensions is small. Non-Euclidean manifolds are performing slightly better in three out of four datasets. The results in low dimensions show a significant effect of the choice of manifold. Generally, the sphere dominates performance in all of the datasets. Additionally, FieldE on a Poincaré Ball performs better than FieldE using Euclidean space on Dataset3 considering the F-Measure metric.
Visualization of Vector Fields. Tracing the vector fields created by different relations gives the intuition about the underlying structures that are preserved by the model. Due to the high dimensionality, the vector fields are usually beyond human perception capabilities, however, in order to provide a presentable illustration, we plot each vector field in pair of dimensions. Therefore, for FieldE with d = 100, we created 99 pairs which we selected six graphs constructed from {(7, 8), (24, 25), (37, 38), (41, 42), (87, 88), (93, 94)} dimension.
In Figure 10, we illustrate corresponding vec-tor fields for influences relation in the YAGO3-10 knowledge graph. The printed vector fields correspond exactly to the case shown in Figure 5 where some people are influencing others in a loop structure, and some people influence others in a path structure (without a return link). These results show full structure preservation from the graph representation to the vector representation. As discussed before, this capability also avoids incorrect inferences. Subgraph 10(a) shows some people influencing many others both in path and loop structures. Subgraph 10(b) shows trajectories of some people being a source influencer for many others. For those source points, the divergence is positive. The subfigure 10(c) is another loop and path occurrence with more density.
A source vector fields is illustrated in subfigure 10 (d). The interpretation of this vector field is that there is a person influencing many others. The subfigure 10(e) shows a set of sink nodes (with negative divergence) where they have been influenced by many. And finally, the subfigure 10(f) shows some more dense source entities. Overall, all of these illustrations for the influences relation illustrate the capability of FieldE inherited from ODEs and facilitated by the concept of vector field and trajectories.
Four other relations (inConnectedTo, hasGender, owns, and livesIn) have been selected to provide broader visualizations of vector fields. In Figure  11, we represent the vector fields of these relations for subgraphs with different structures including paths and loops. Each row corresponds to the visualizations of one relation for which three different learned structures are selected to be shown. For example, subgraphs of 11(a), 11(b), and 11(c) correspond to the "isconnectedto" relation that shows which airports are connected to each other in different structures (loops and paths). Our visualizations capture different preserved structures including paths and loops which show some airports are connected in a loop form and some not. In the subgraphs of 11 (d), 11(e), 11(f), different structures of "hasGender" relation are captured. Same for the "livesIn" relation, we show different visualizations of the vector fields in 11 (j), 11(k), and 11(l). By these visualizations, we aim at giving clarity on the effect of ODEs in learning vector fields. All of these are trajectories lying on relation-specific vector fields learned by the neural network of our model. The arrows show the direction of structure evolution in the vector space for each subfigure.
Further Evaluations. A comparison of our model to a list of six state-of-the-art models has been provided, in addition to four test sets. The purpose of this evaluation is to deepen the analysis by taking more specific metrics into account and by designing test sets based on characteristics of the relations. The results are presented in Figure 9 where ComplEx, Dismult, Quate, RotatE, TransE, and pRotatE mdoels are compared to FieldE. We performed these evaluations in low (32), and high (500) dimensions. As can be seen, our model outperforms all the other models in low dimension in all the datasets except QuatE with which there is a very close competition in Dataset3. In high dimension, our model outperforms other models in Dataset1 and Dataset3. For the other two datasets, QuatE and pRotatE perform close to our model. Additionally, we investigate the effect of coordinate transformation by using a neural network i.e.
de L (t) dt = f θr (e L (t)), where e L (t) = φ(e(t)) with φ being represented by a neural network. This corresponds to posing the manifold dynamics in a (possibly lowerdimensional) space similar to the encoder step in auto-encoders. We observed that in Dataset4, this variant of our model outperforms the original Euclidean version without coordinate transformation in low dimension of 32 (98 vs 88%) using right Hits@10 (tail ranking). However, on the other test sets, the original version of FieldE obtained a slightly better/closer performance than FieldE with neural coordinate transformation. We conjecture that such results are due to the advantage of the vector field used in the original FieldE which can capture data complexity without using neural coordinate transformation.