TranS: Transition-based Knowledge Graph Embedding with Synthetic Relation Representation

Knowledge graph embedding (KGE) aims to learn continuous vectors of relations and entities in knowledge graph. Recently, transition-based KGE methods have achieved promising performance, where the single relation vector learns to translate head entity to tail entity. However, this scoring pattern is not suitable for complex scenarios where the same entity pair has different relations. Previous models usually focus on the improvement of entity representation for 1-to-N, N-to-1 and N-to-N relations, but ignore the single relation vector. In this paper, we propose a novel transition-based method, TranS, for knowledge graph embedding. The single relation vector in traditional scoring patterns is replaced with synthetic relation representation, which can solve these issues effectively and efficiently. Experiments on a large knowledge graph dataset, ogbl-wikikg2, show that our model achieves state-of-the-art results.


Introduction
Knowledge graphs (KGs), such as Freebase (Bollacker et al., 2008), Wikidata (Vrandečić and Krötzsch, 2014), DBpedia (Lehmann et al., 2015) and Yago (Rebele et al., 2016), play a very important role in many fields, including question answering (Huang et al., 2019), semantic parsing (Yih et al., 2015), information retrieval (Xiong et al., 2017) and so on.KG, as a multi-relational graph, is composed of entities as nodes and relations as different types of edges.It is usually represented as the form of triplets (h, r, t), i.e., (head entity, relation, tail entity), where relation indicates the relationship between the two entities.Knowledge graph embedding (KGE) is an important and fundamental research topic in KG.It aims to learn dense semantic representations of entities and relations for downstream tasks such as KG completion and link prediction.Generally speaking, KGE methods can be roughly divided into the following directions: translational distance (Bordes et al., 2013;Wang et al., 2014;Fan et al., 2014;Lin et al., 2015;Ji et al., 2015Ji et al., , 2016;;Feng et al., 2016), semantic matching (Nickel et al., 2011;Bordes et al., 2011Bordes et al., , 2014;;García-Durán et al., 2014;Yang et al., 2015;Nickel et al., 2016;Balazevic et al., 2019) and neural networks (Socher et al., 2013;Dong et al., 2014;Liu et al., 2016;Dettmers et al., 2018;Nguyen et al., 2018).Because transition-based KGE method like TransE (Bordes et al., 2013) is simple and effective, this series of models are becoming more and more popular in both academia and industry.Specifically, TransE makes the difference between two entity vectors (h and t) approximate to the relation vector (r), i.e., t − h ≈ r.That is to say, the relation r is characterized by the translating vector r.However, TransE is not suitable to deal with complex relations like one-to-many/many-to-one/manyto-many.For example, in Figure 1, after graduating from Erasmus University Rotterdam, Pauline Meurs became a professor at the same university.And the composer, producer, screenwriter, editor and director of the film, Indramalati, can be the same person, Jyoti Prasad Agarwala.Although previous models (Wang et al., 2014;Lin et al., 2015;Qian et al., 2018;Chao et al., 2021;Yu et al., 2021) such as TransH/R/D have considered relevant issues, they still focus on the entity-relation projection or interaction in the entity part and continue the TransE pattern, R t − R h ≈ r, where R t and R h is the deformation of t and h, R t − R h is the entity part, and r is the relation part.Actually, recent research, InterHT (Wang et al., 2022), shows that the entity part only needs to consider the head and tail entities and their interaction information to achieve remarkable performance and outperform previous TransX series models.Unfortunately, it again ignores the problem of complex relation representation.Therefore, from the perspective of interaction, how to solve the problem in Figure 1 by introducing entity-relation interactions in the relation part under the condition that only entity-entity interactions are retained in the entity part needs to be further considered.
To this end, we propose a novel transition-based knowledge graph embedding model, TranS, which replaces traditional scoring pattern with synthetic relation pattern, i.e., R t − R h ≈ r + r + r.The final relation representation is the sum of multiple relation vectors.Two of them (r, r) are also related to the head entity h and the tail entity t in addition to the relation r (orange solid lines denote r, and blue dotted lines denote r, r in Figure 1).For one thing, in the entity part, instead of using entity-relation interaction and projection, it focuses only on entities and their interactions themselves to guarantee their independence and effectiveness.For another thing, different from other methods that utilize entity-relation interactions in the entity part, our method migrates their interactions to the relation part and forms synthetic relation representation, which can effectively solve the problem that a single relation vector cannot represent different relations when facing the same entity pair.Experiments on a large knowledge graph dataset, ogbl-wikikg2, show that our proposed model achieves the best results with fewer parameters.

TranS
Our proposed TranS model first breaks the traditional scoring patterns R t − R h ≈ r in previous models (Bordes et al., 2013;Wang et al., 2014;Fan et al., 2014;Lin et al., 2015;Chao et al., 2021;Yu et al., 2021;Wang et al., 2022).It replaces single relation vector r with synthetic relation vectors r + r +r, i.e., R t − R h ≈ r + r + r, where r is an adjoint relation vector related to the head entity and r is another adjoint relation vector related to the tail entity.The illustration of TranS is shown in Figure 2 (f).Two entity and three relation representations together make up our proposed scoring function f r (h, t).That is to say, the synthetic relation representation in the right relation part consists of the sum of three different relation vectors.To make full use of context information, we use adjoint vectors and Hadamard product • to interact with h, t, r and r separately: where h, t and r denote main vectors similar to those in traditional scoring patterns.h represents the adjoint head entity vector and t represents the adjoint tail entity vector.Accordingly, R h is the representation of the head entity that combines information of the tail entity, and R t is the representation of the tail entity integrating information of the head entity.r • h is the representation of the adjoint relation with the head entity information, and r • t is the representation of another adjoint relation with the tail entity information.Thus, the final equation can be represented as: :::::::::::::: ||.
(2) Following previous works (Yu et al., 2021;Wang et al., 2022), we add an unit vector e to R h and R t , i.e., h And considering the out-of-vocabulary problem, we also use the NodePiece (Galkin et al., 2022) to learn a fixed-size entity vocabulary.

Training
Inspired by previous works (Chao et al., 2021;Zhang and Yang, 2021;Wang et al., 2022), we use the self-adversarial negative sampling loss (Sun et al., 2019) as our loss function, which is defined as follows: where γ is a fixed margin, σ is the sigmoid function, and (h ′ i , r, t ′ i ) is the i-th of n randomly sampled negative triplets.And the weights of this negative sample p(h ′ i , r, t ′ i ) can be calculated as follows: . (4)

Comparison
As shown in Figure 2 3 Experiments

Dataset and Metric
Ogbl-wikikg2 (Hu et al., 2020)  triplets do not appear in KG.The goal is to rank the true head or tail entities higher than the negative entities, which is measured by Mean Reciprocal Rank (MRR).We follow the original dataset partition.The triplets are split according to time to simulate a real KG completion scenario where missing triplets that are not present at a specific timestamp need to be filled.The training set contains 16,109,182 triplets, the validation set contains 429,456 triplets, and the test set contains 598,543 triplets.

Implementation Details
In our experiments, Adam (Kingma and Ba, 2014) is used as our optimizer with 0.0005 learning rate.The batch size of the model is set to 512.To prevent overfitting, we use the dropout technique and set it to 0.05.The negative sampling size is set to 128.And the dimension of each embedding vector in Eq. 2 is set to 200.The maximum number of training steps is 800 thousand.We validate the model every 20 thousand steps.The number of anchors for NodePiece is 20 thousand.And γ in the loss function is set to 6.The final model is evaluated with 10 different random seeds.Our code is publicly available at the link: https:// github.com/xyznlp/TranS.

Results
The results are shown in Table 2. Our model achieves 0.6988 (validation set) and 0.6882 (test set) on MRR, which outperforms the previous best model, TripleREv3, on the ogbl-wikikg2 dataset.Especially, the parameters of our model (19.2M) are about half of TripleREv3 (36.4M).So the experimental results show that our proposed method can improve the model performance effectively with fewer parameters.Besides, we also construct a 38.4M TranS (large) model, the best score of which can reach 0.7101 (validation set) and 0.6992 (test set) on MRR.Comparing the two groups with similar numbers of parameters, i.e., TranS versus In-terHT and TranS (large) versus TripleREv3, we can observe more significant improvements.

Related Work
Recently, graph structures are used widely in natural language processing, recommendation and other areas (Zhang, 2020;Zhang et al., 2021).KG, as one of the graph structures, uses triples consisting of head nodes, tail nodes and relation edges to represent structured knowledge.To further compare different transition-based knowledge graph embeddings, we summarize related methods in Table 3 with reference to recent research (Ji et al., 2021).Transition-based methods measure the plausibility of fact triples (h,r,t) as the distance between entities.TransE (Bordes et al., 2013), as a representative method, models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities, i.e., t−h ≈ r.Although it is simple and efficient, it cannot handle complex relations.Thus, several TransX models (TransH (Wang et al., 2014), TransR (Lin et al., 2015), TransD (Ji et al., 2015)) are proposed based on hyperplane or multiple embedding spaces for these issues.For example, TransR (Lin et al., 2015) projects entities from entity space to corresponding relation space and builds translations between projected entities.And recent works also begin to utilize multiple vectors to represent entities and relations and conduct their interactions.For example, PairRE (Chao et al., 2021) and TripleRE (Yu et al., 2021) employ two and three relation vectors to represent relation information, respectively.Especially, InterHT (Wang et al., 2022) outperforms previous models only with two head and tail vectors and their interactions in the entity part.But InterHT again ignores the problem of complex relation representation.Different from previous models, from the perspective of interaction (Zhang et al., 2022;Zhang and Wang, 2020;Zhang, 2019)

Limitations
Although our model has achieved the best performance on relevant datasets, it still focuses on current or local KG triples to learn entity and relation representations.Actually, in large-scale knowledge graphs, neighborhoods can provide extra information for entity representation or initialization like NodePiece.Thus the performance of our model can be further improved by exploring additional neighbor information and encoding methods.

Figure 1 :
Figure 1: Examples from ogbl-wikikg2.It is difficult for a single relation vector to represent different relations between the same entity pairs.
, the main difference between our model (f) and previous transition-basedKGE  methods (a,b,c,d,e)  is the synthetic relation representation.That is to say, it changes single relation representation r in traditional scoring pattern R t − R h ≈ r to synthetic relation representation r + r + r in our proposed new pattern R t − R h ≈ r + r + r.Specifically, different from InterHT(Wang et al., 2022), the relation part of our scoring function is the sum of multiple relation vectors R r = r • h + r + r • t rather than single vector r.Comparing with TripleRE(Yu et al., 2021), where three relations are applied into three parts(R h = h • r h , R t = t • r t , R r = r m )of traditional scoring patterns with addition and subtraction operations, our proposed TranS only applies synthetic relation vectors into the relation part R r = r • h + r + r • t of scoring functions with addition operations.
, our proposed TranS introduces entity-entity interaction in the entity part like InterHT and migrates entity-relation interaction from the entity part to the relation part.It can not only preserve the independence of entity representation, but also utilize entity-relation interaction in the relation part to solve the above problem.5 Conclusion In this paper, we propose a novel transition-based knowledge graph embedding model, TranS, to solve the representation problem of complex scenarios where the same entity pair has different relations.TranS replaces the single relation vector of the relation part in traditional scoring patterns with synthetic relation representation.It not only retains the independence of entity interaction in the entity part, but also introduces entity-relation interaction in the relation part.Experiments on a large KG dataset, ogbl-wikikg2, show that our model achieves the best results with fewer parameters.

Table 3 :
Summary of transition-based knowledge graph embedding models.T represents the traditional scoring pattern −||R h − R t + r||.And S represents our proposed new scoring pattern −||R h − R t + r + r + r||.