KGRefiner: Knowledge Graph Refinement for Improving Accuracy of Translational Link Prediction Methods

Link Prediction is the task of predicting missing relations between knowledge graph entities (KG). Recent work in link prediction mainly attempted to adapt a model to increase link prediction accuracy by using more layers in neural network architecture, which heavily rely on computational resources. This paper proposes the refinement of knowledge graphs to perform link prediction operations more accurately using relatively fast translational models. Translational link prediction models have significantly less complexity than deep learning approaches; this motivated us to improve their accuracy. Our method uses the ontologies of knowledge graphs to add information as auxiliary nodes to the graph. Then, these auxiliary nodes are connected to ordinary nodes of the KG that contain auxiliary information in their hierarchy. Our experiments show that our method can significantly increase the performance of translational link prediction methods in Hit@10, Mean Rank, and Mean Reciprocal Rank.


INTRODUCTION
Knowledge graphs represent a set of interconnected descriptions of entities, including objects, events, or concepts.These graphs are structures by which knowledge is stored in triples.These triples include the three parts head, relation, and tail.The relation determines the type of relationship between head and tail.These graphs are becoming a popular approach to display and model different information in the world.Additionally, knowledge graphs have several applications, for example, question answering systems (Bordes et al., 2014a;b), recommendation systems (Zhang et al., 2016), search engines (Xiong et al., 2017), relationship extraction (Mintz et al., 2009), etc.Despite many efforts to build knowledge graphs, they are not complete yet.For example, in the Freebase (Bollacker et al., 2008), over 70% of people do not have their place of birth in the graph.This incompleteness of knowledge graphs has motivated researchers to add information to the graph and complete it.
One of the developing fields in completing the knowledge graph is knowledge graph embedding (KGE).The task of KGE is to embed entities and relationships in a small continuous vector space.One application of these embedding is to predict missing links in the knowledge graph.
Translational link prediction models use the sum of the head and relation vectors to predict the tail.These models started with TransE (Bordes et al., 2013), and after that, TransH (Wang et al., 2014), TransR (Lin et al., 2015), TransD (Ji et al., 2015), RotatE (Sun et al., 2019), etc., tried to improve it in the following years.The advantages of translational methods over deep learning techniques are that they are robust, and their score function is considerably faster.Therefore, in this work, we tried to improve these translational methods.
There is a lot of information in knowledge graphs.The hierarchy of entities and relationships is part of it.Paris, for example, its hierarchy is "entity → physical entity → object → location → region → area → center → seat → capital → national capital".This hierarchy is not given enough attention in link prediction methods, and we intend to use this information in this paper.SACN (Shang et al., 2019) added some nodes and relationships to the graph to use the graph structure information but did not justify adding these nodes and edges, so it is not generalizable for other graphs.In addition, SACN added this information only to FB15K237 and did not provide a method for WN18RR.In this paper, we added a much smaller number of relationships and fewer nodes to the graph training section by interpreting them.HRS (Zhang et al., 2018) used relation clusters and sub-relations to use this information.Nevertheless, like SACN, this can not be generalized well.
The (Moon et al., 2017) considered that if two entities are embedded closely in the embedding space, they are similar and assigned entities' classes based on closeness.Still, we assumed that if two entities use the same relation in the graph or have common elements in their hierarchies, they are related.
When link prediction models learned the relation between Paris and France, previous link prediction methods did not notice that Paris is a city and France is a country.To use this information, we added auxiliary nodes to the graph that included the classes of entities and connected them to related entities.For example, we added an extra node for countries to the knowledge graph and connected it to all the knowledge graph countries.Our contributions are as follows: • We presented a method for refining the knowledge graph, which is independent of the structure of the link prediction model and adds triples to the knowledge graph.These triples increase the accuracy of link prediction with the same time and space complexity of translational models.
• We evaluated our proposed method on two FB15K237 and WN18RR datasets with successful translational models.The results showed that accuracy in link prediction was significantly increased on H@10, MRR, and MR.

RELATED WORK
Knowledge graph embedding is an active and developing field to embed the entities and relations of the knowledge graph.These embeddings are used in link prediction, question answering systems, relation extraction, etc. Knowledge graph embedding starts with TransE (Bordes et al., 2013), which is the first translational link prediction method.It interprets relation as a transition from head entity to tail in the graph.Some drawbacks of the TransE model are its inability to model N-1, 1-N, and N-N relationships.In the following years, some other translational approaches, such as TransH (Wang et al., 2014), TransD (Ji et al., 2015), and TransR (Lin et al., 2015), were inspired by the initial idea of TransE (Bordes et al., 2013) and tried to improve it.These translational models have much more speed against deep learning models such as ConvE (Dettmers et al., 2018), ConvKB (Nguyen et al., 2018), SACN (Shang et al., 2019), and HAKE (Zhang et al., 2020), but their accuracy is slightly lower than these models.Therefore, we proposed a method to increase the accuracy of these translational models.Knowledge graph refinement is a field of correcting or improving the knowledge graph.BioKG (Zhao et al., 2020), which worked on medical graphs, has tried to provide a method for removing the wrong information in these graphs.Other works in the refinement of the knowledge graphs try to add information.SACN (Shang et al., 2019) has also added attributes to the knowledge graph, like our work.SACN proposed FB15k237 Attr; this method for constructing this dataset has three major issues.First, it only worked for FB15k237, but our proposed method can be applied on WN18RR as well.Second, it has brought the number of FB15k237 relations from 237 to 484; therefore, it has more time complexity than ours.However, we only proposed two new relations for FB15k237 and only one relation for WN18RR.Third, these new relations and entities are not interpretable in SACN; It does not provide a reason for adding these attributes.So it can not be generalized on other graphs.HRS (Zhang et al., 2018) tried to use sub-relation and relation-cluster to make better predictions.It used the hierarchy of relations as a sub-relationship, and it created a relation cluster to use these as two additional parts of the transition in the translational models.
TransD (Ji et al., 2015) : It creates a dynamic matrix for all entity-relation pairs and maps the head and tail into M1 and M2, respectively.The transition from head to tail is as follow: TransR (Lin et al., 2015) : It considers that entities may have multiple aspects, and various relations focus on different aspects of entities.It projects entities into relation space by projection matrix M. (Sun et al., 2019) : RotatE deals with relation as a rotation to complex space.This rotation brings the source entity to the target entity in the complex space.The relation applies to the head entity by Hadamard product.Then it uses the L1 norm to measure the distance from the tail entity in the score function.

KNOWLEDGE GRAPH REFINEMENT
The knowledge graph refinement follows two main objectives: (A) adding information to the knowledge graph, which is a subcategory of the knowledge graph completion.(B) Detecting incorrect information and remove those triplets from the knowledge graph to increase the correctness of the knowledge graph.

KGREFINER
In this work, we propose a method to add information to the graph, which refines the knowledge graph and increases link prediction accuracy.In FB15k237, we do this refinement by using relation hierarchies, and in WN18RR, we use hierarchies of entities.We add this information to the graph as a new node; these nodes are auxiliary nodes.We introduce several new relations to connect these new nodes to graph nodes, and we add these triples to the graph.Translational link prediction methods such as TransE (Bordes et al., 2013), TransH (Wang et al., 2014), TransD (Ji et al., 2015), etc., create transition property in their embeddings.For example, in TransE, embeddings are made as follow: This means in embedding space; the tail entity should be close to the sum of head and relation.For example, let's consider these triples: Link prediction model is not aware of both tails entities are country.If we add new node as "country" to the graph and connect it to all graph's countries with a new relation "RelatedTo" then these triples are added to graph: Equations 4 and 5, which are similar, bring closer the embeddings of France and Iran, which are semantically identical.Figure 1 gives an illustration of what changes KGrefiner brings for the embedding space.This closeness in evaluating Equation 2causes the model to search between countries when asked where France's capital is.

REFINEMENT OF FB15K237
In FB15k237, graph relations contain information about entities.For example, the "entity → physical entity → object → location → region → area → center → seat → capital → national capital" is a relationship between countries and cities, and nodes on one side of relationships can be considered similar.Higher levels usually have more general information about objects in the hierarchy, and lower levels have more specific, so we extracted the last three levels of hierarchies from each relation in this graph to use this information.Then, for each sub-relation, we counted the number of repetitions in the graph training section.We removed those components with less than 100 repetitions in the graph to reduce the number of these sub-relations, and the number 100 is arbitrary.Finally, 285 sub-relations remained, which we added to the set of entities in this graph (as new nodes).We call these auxiliary nodes relation-nodes.We defined two new relations, "RelatedTo" and "HasAttribute", to connect these relation-nodes to the graph.For each triple, if the entity is the triple's head, we linked it with relation-node by "RelatedTo", and if it is the tail of the triple, we use "HasAttribute" to establish these connections.For example, to refine relation between Paris and France, (Paris,"entity → physical˙entity → object → location → region → area → center → seat → capital → national˙capital",France), "capital" has repetition over 100, so the following triples were added to the graph: To refine this graph, we use the hierarchy of entities.In Freebase, we used relationships, but relationships do not give us information about entities in Wordnet.France, for example, has a hierarchy of "existence → place → region → region → administrative region → country → France".This hierarchy gives us good information about France.Except for the last level, we extract the other last three levels of entities.Among these levels, we hold those with more than an arbitrary number of 50 repetitions among entities to reduce these levels.As a result, 207 levels remained.We add these levels as new nodes to the graph training section and connect them to the entities with these levels in their hierarchy with a new type of connection.In this graph, we define a new relation and name it "HasAttribute".For example, France and Iran have a "country" in their hierarchical structure.Then, the following triples were added to the training section of the graph:

DATASETS
We evaluated our work on popular benchmarks: FB15K237 and WN18RR; these datasets are respectively refined from real knowledge graphs: WordNet (Miller, 1995) and Freebase (Bollacker et al., 2008).In addition, we built two other datasets with KGRefiner: FB15K237-Refined and WN18RR-Refined, respectively, from FB15K237 and WN18RR.The details of the datasets are shown in Table 1.

EXPERIMENTAL SETTINGS
We used implementation of baselines by OpenKE (Han et al., 2018).We used an embedding dimension of 200 for all models.Also, we removed self adversarial negative sampling from TransE and RotatE to have a fair comparison.We tried {200, 500, 1000, 2000} epochs, and we picked the best epoch according to MRR on the validation set.Other hyperparameters of the models are those mentioned in OpenKE.Hyperparameters for FB15K237 and FB15K237-Refined and also WN18RR and WN18RR-Refined are the same.(Zhang et al., 2018), for RotatE we used (Han et al., 2018) to produce scores.For other results, we used (Han et al., 2018) to produce them.

SPEED OF MODELS
The training time of translational models is much less than deep learning approaches such as ConvE, SACN, ConvKB, etc.The complexity in scoring function and neural network layers in their architecture reduces training speed in deep learning methods.Table 4 compares the time that each model needs to be trained for one epoch on FB15k237.We ran models on Nvidia K80.For fair comparison embedding dimension for all models is 200.These models usually need 1000 epochs, so the runtime difference between TransE and RotatE is around 35000s for FB15k237.

CONCLUSION
In this paper, we propose KGRefiner, a novel knowledge graph refinement method that alleviates the limitations of translational models by capturing additional information in knowledge graph hierarchies.We used hierarchy components as new nodes, and by connecting these nodes to proper entities in the knowledge graph, we have a more informative graph.Our experimental results show that our KGRefiner outperforms other state-of-the-art translational models on two benchmark datasets WN18RR and FB15k237.Furthermore, it is the first augmentation method that works with both Wordnet and Freebase, while old methods only perform only on one dataset.In future works, we will expand our work on datasets that can be formulated on the triple structure.For example, recommender system datasets can be formed on graph schema, and KGRefiner can be applied.

TransE:
For factual triple (e s ,r,e o ), adding embeddings of head and relation should be closed to the tail embedding, and on the other hand, for corrupted ones (e s ,r,e o ), e s + r should have a distance with e o .The score function of TransE is as follow: ψ(e o , r, e s ) = −||h + r − t|| 2 2 TransH (Wang et al., 2014): To improve modelling of N-1, 1-N and N-N, TransH defined a hyperplane for each relations, and translation property should be established on that hyperplane.

Figure 1 :
Figure 1: Simple illustration of changes in embedding space.The right side graph shows the effect of adding auxiliary nodes to the graph, which translational models bring all countries together and cities together in vector space.
(Bordes et al., 2013)f all entities of knowledge graph and R set of all its relationships.The (e s , r, e o ) is called a triple.The e s ∼ E is the head, and e o ∼ E is the tail of a triple.Finally, r ∼ E represents the relation between e s and e o .3.1 LINK PREDICTIONLink prediction is the task of predicting the missing link of a knowledge graph by inferring from existing facts on it.The score function of link prediction methods is ψ(e o , r, e s ), which evaluates triple's accuracy.Our goal in teaching a model that has the highest estimation for the missing triplets of the graph and the lowest prediction for false triples.3.2TRANSLATIONALLINKPREDICTIONMODELSTranslational link prediction methods consider the relation as a transition from head to tail.For example (Paris, Capital of, France), the relation "Capital of" is a transition from Paris to France.TransE(Bordes et al., 2013)is the first translational link prediction model.In TransE, embeddings for correct triples are learned as e s + r ∼ e o .It means that the sum of the head's embedding and relation's embedding must be close to the tail; primarily, the distance measure is the L2 norm.Here are some translational link predictions: Because links in Wordnet do not have information about entities, HRS sub-relation and relation-cluster on Wordnet are meaningless.3BACKGROUND Suppose

Table 2
and 3 compares the experimental results of our KGRefiner plus translational models and with previously published results.Results in bold font are the best results in the group, and the underlined results denote the best results in the column.KGRefiner with TransH obtains the highest H@10 and MRR on FB15k237, and also KGRefiner with RotatE reached the best MR and H@10 in WN18RR.

Table 1 :
Statistics of the experimental datasets.The refined version represents that graph has some auxiliary nodes.

Table 3 :
Link prediction results on WN18RR and its refined version.Results of TransE is taken from (Nguyen et al., 2018), TransH and TransD from

Table 4 :
Comparison between translational technique and deep learning methods in training time.