A Semantic Filter Based on Relations for Knowledge Graph Completion

Knowledge graph embedding, representing entities and relations in the knowledge graphs with high-dimensional vectors, has made significant progress in link prediction. More researchers have explored the representational capabilities of models in recent years. That is, they investigate better representational models to fit symmetry/antisymmetry and combination relationships. The current embedding models are more inclined to utilize the identical vector for the same entity in various triples to measure the matching performance. The observation that measuring the rationality of specific triples means comparing the matching degree of the specific attributes associated with the relations is well-known. Inspired by this fact, this paper designs Semantic Filter Based on Relations(SFBR) to extract the required attributes of the entities. Then the rationality of triples is compared under these extracted attributes through the traditional embedding models. The semantic filter module can be added to most geometric and tensor decomposition models with minimal additional memory. experiments on the benchmark datasets show that the semantic filter based on relations can suppress the impact of other attribute dimensions and improve link prediction performance. The tensor decomposition models with SFBR have achieved state-of-the-art.


Introduction
Knowledge Graphs (KGs) are collections of largescale triples, such as Freebase (Bordes et al., 2013) , YAGO (Suchanek et al., 2008) and DBpedia (Auer et al., 2007). KGs play a crucial role in applications such as question answering services, search engines, and medical care. Although there are billions of triples in KGs, they are still incomplete. These incomplete knowledge bases will bring limitations to practical applications. Therefore, knowledge graph completion, known as link prediction, which automatically predicts missing links between entities based on given links, has recently attracted growing attention.
Inspired by word embedding (Mikolov et al., 2013), researchers try to solve the link prediction through knowledge graph embedding. Knowledge graph embedding models map entities and relations into low-dimensional vectors (or matrices, tensors), measure the rationality of triples through specific functions between entities and relations, and rank the triples with function scores. Since TransE (Bordes et al., 2013) proposes to use relation vectors to represent the geometric distance between entities, many variants emerge. For example, TransH (Wang et al., 2014) first explores the different representations of entities under different relations. TransR (Lin et al., 2015) attempts to map entities to the relational space through a particular matrix. TransD (Ji et al., 2015) tries to incorporate the different representations of the entities under the entity and relation into the calculation. These variants attempt to perform complex transformations based on relations or triples to achieve different representations of entities in different semantic spaces.
Recently, scholars are more inclined to solve link prediction by designing models with more powerful representation, such as ComplEx (Trouillon et al., 2016), Tucker (Balazevic et al., 2019), RotatE (Sun et al., 2019), a method based on vector space rotation, and HAKE (Zhang et al., 2020a). Contrary to the actual semantic description, models in recent research apply identical representation for the same entity in different triples.
Since the invention of TransE (Bordes et al., 2013), early scholars, who realized that we should compare different attributes of entities in different triples, tried to improve the model in this direction. However, most recent studies only focus on investigating the more robust representation of entities, such as AutoETER (Niu et al., 2020) and Ro-tatE (Sun et al., 2019). Surprisingly, the attempt to find various representations of entities in different semantic spaces is gradually discarded.

Shape: Cube Cube
Colour: Blue Yellow = Figure 1: Comparison of boxes with the same shape and different colors.
In practice, entities are collections of attributes, and each entity can contain various semantic attributes. Figure 1 shows the comparison of boxes with the same shape and different colors. When comparing different attributes such as colors or shapes, entities should have different expressions rather than exact representations. The paper believes that each relation describes the links between the head and tail entities in particular attributes. Measuring the plausibility of a given triplet means comparing the matching degree of the attributes associated with the predicate between the entities. Therefore, this paper proposes a semantic filter module to select different attributes of entities in different triples.
This paper designs a semantic filter based on relations. By employing the semantic filter, only the semantics associated with the relations are extracted, and the information of other unneeded dimensions is suppressed. As a result, the head and tail entities are compared under a limited semantic space.
We take the MLP-based semantic filter as the departure. Following the regularization strategy of diagonalization, this paper designs two SFBRs: Linear-2 and Diag. Note that MLP-based SFBR is a general model that can be transformed into most geometric and tensor decomposition models through special regularizations. We analyze several models in Appendix A to show the generality of MLP-based SFBR.
Overall, this paper proposes Semantic Filter Based on Relations (SFBR), which can be added to geometric and tensor decomposition models. SFBR suppresses the interference of useless dimensions and improves the reasoning performance; SFBR occupies minimal additional resources. Experiments on the benchmark datasets show that the tensor decomposition models with SFBR achieve state-of-the-art.

Related work
In this section, we describe related works and the critical differences between them. We divide knowledge graph embedding models into three leading families (Akrami et al., 2020), including Tensor Decomposition Models, Geometric Models, and Deep Learning Models.
Tensor Decomposition Models. These models implicitly consider triples as tensor decomposition. DistMult  constrains all relation embeddings to be diagonal matrices, which reduces the space of parameters to access a more accessible model to train. RESCAL (Nickel et al., 2011) represents each relationship with a full rank matrix. ComplEx (Trouillon et al., 2016) extends the KG embeddings to the complex space to better model asymmetric and inverse relations. Analogy (Liu et al., 2017) employs the general bilinear scoring function but adds two main constraints inspired by analogical structures. Based on the Tucker decomposition, TuckER (Balazevic et al., 2019) factorizes a tensor into a set of vectors and a smaller shared core matrix.
Geometric Models. Geometric Models interpret relations as geometric transformations in the latent space. TransE (Bordes et al., 2013) is the first translation-based method, which treats relations as translation operations from the head entities to the tail entities. Along with TransE (Bordes et al., 2013), multiple variants, including TransH (Wang et al., 2014), TransR (Lin et al., 2015) and TransD (Ji et al., 2015), are proposed to improve the embedding performance of KGs. Recently, RotatE (Sun et al., 2019) defines each relation as a rotation from head entities to tail entities.
Deep Learning Models. Deep Learning Models use deep neural networks to perform knowledge graph completion. ConvE (Dettmers et al., 2018) and ConvKB(Nguyen et al., 2018) employ convolutional neural networks to define score functions. CapsE(Nguyen et al., 2019) embeds entities and relations into one-dimensional vectors under the basic assumption that different embeddings encode homologous aspects in the same positions. CompGCN (Vashishth et al., 2020) utilizes graph convolutional networks to update the knowledge graph embedding.
There are also other models, such as DURA (Zhang et al., 2020b), which are proposed to solve overfitting. Together, most of the above studies intend to find a more robust repre-senting approach. Measuring the effectiveness of certain triples is to compare the matching degree of specific attributes based on relations. Only a few models, such as TransH (Wang et al., 2014), TransR (Lin et al., 2015), and TransD (Ji et al., 2015), consider that entities in different triples should have different representation. However, these variants require many occupations of resources and are limited to particular models.

Background
In this section, we introduce KG embedding and KG completion tasks. Next, we briefly introduce several models involved in this paper.
KG Completion.

Knowledge graphs are collections of factual triples
where (h, r, t) represents a triple in the knowledge graph, h, t, r are head, tail entities and relations respectively. Knowledge graph embedding associates the entities h, t and relations r with vectors h, t, r. Then we design an appropriate scoring function d r (h, t) :E × R × E → R, to map the embedding of the triple to a certain score. For a particular question (h, r, ?), the task of KG completion is ranking all possible answers and obtain the preference of prediction.
Geometric Models. The models treat the relations as the transformation of entities in latent spaces. TransE (Bordes et al., 2013)is the first model that uses vectors to represent entities and relations. TransE supposes that entities and relations satisfy h + r = t where h, r, t ∈ R n . The scoring function can be expressed as: RotatE (Sun et al., 2019) defines the relation as a rotation from head entities to tail entities in complex spaces. Given a triple {h, t, r}, we expect that t = h • r, where h, r, t ∈ C k are the embeddings, the modulus for each dimension of relations satisfy |r i | = 1 and • denotes the Hadamard product. The score function is : Where h, r, t ∈ C k , |r i | = 1. Tensor Factorization Models. Models in this family interpret link prediction as a task of tensor decomposition, where triples are decomposed into a combination (e.g., a multi-linear product) of low-dimensional vectors for entities and relations. CP (Lacroix et al., 2018) represents triples with canonical decomposition. Note that the same entity has different representations at the head and tail of the triplet. The score function can be expressed as : Where h, r, t ∈ R k .RESCAL (Nickel et al., 2011) represents a relation as a matrix M r ∈ R d×d that describes the interactions between latent representations of entities. The score function is defined as: ComplEx (Trouillon et al., 2016) extends the real space to complex spaces and constrains the embeddings for relation to be diagonal matrixs. The bilinear product becomes a Hermitian product in complex spaces. The score function can be expressed as: where h, r, t ∈ C k .

SFBR model
This section introduces a novel module-A Semantic Filter Based on Relations for knowledge graph completion. We first introduce the basic framework of SFBR in Section 4.1 and the specific filter design in Section 4.2. Finally, we introduce several cases on several models in Section 4.3.

Framework of SFBR
As is shown in the left of Figure 2, the mainstream KG embedding model depends on the unique representation of entities and relations. The rationality of possible triples is compared through the rankings calculated by the score function. It is widely accepted that an entity may contain various attributes. This paper believes that each relation describes the relationship between entities in specific attributes. In different triples with different relations, the attributes compared by the triples should also be unique. The comparison requires the choice of needed attributes. For a given triplet, this paper filters out the needed attributes of the triplet by special functions and ranks the triples with the scores calculated by filtered attributes.  As shown in the right of Figure 2, based on the traditional embedding method, this paper designs a relation-based function for the entities. This function reinforces the dimensions associated with the relations and suppresses the information of other unrelated dimensions. This operation is similar to filters used in signal processing, so the module is named the relation-based semantic filter. The score function can be express as : Where d r (h, t) is the traditional scoring function, d f r (h, t) is the modified scoring function and f r ( * ) is the semantic filter.

Semantic Filter Module
We first try to design the filter based on multilayer perceptron (MLP) for SFBR.
In order to guarantee each relation filters out different semantics, each relation uses a separate f r ( * ). However, the semantic filters based on the MLP will bring enormous parameters, and the matrix multiplication requires many resources. As shown in Figure 3, The paper attempts to regularize MLP: diagonalization.
Notice that SFBR is introduced as a module into the existing models in this paper. However, we must be clear about the theoretical status of the SFBR based on MLP. MLP-based SFBR is a general model that can be transformed into most geometric and tensor decomposition models through different regularizations. This paper selects TransE, RotatE, RESCAL, and ComplEx as examples and conducts these regularization analyses in Appendix A.
where W Linear−2 r ∈ R n×n ,w 1 , w 2 , w 3 , w 4 ∈ R n/2 . Too many parameters of MLP make the model hard to train, which promotes regularization. First, we ignore the bias. As shown in Eq.(9)and Eq.(10), We decompose the semantic filter matrix of MLP into four square matrices of equal size and diagonalize four square matrices to reduce the parameter quantity of the relational filter. Since this diagonalization equals a linear combination of two parts of entities, we call this SFBR Linear-2.
where W Diag r ∈ R n×n and all elements in O equal zero.
To further lessen the number of parameters, the paper directly diagonalizes the filter matrix, taking a one-dimensional vector as the semantic filter. The paper names this SFBR Diag. Where w, b ∈ R n , denotes Hadmard (or element-wise) product, × denotes matrix multiplication.

Special Cases with SFBR
This section will introduce the examples of SFBR for different models, including TransE, RotatE, and RESCAL.
The corresponding score function of SFBR based on TransE can be expressed as: where f r (e) = e × W r + b, e, W r , b ∈ R n and e represents the entity vectors h, t.
The corresponding score function of SFBR based on RotatE can be expressed as: where f r (e) = e × W r + b, e, W r , b ∈ C n and e represents the entity vector h, t.
The corresponding score function of SFBR based on RESCAL can be expressed as: Notice that using f h r (t) = t × W r + b for tails is more in line with our design. However, the prediction is to rank the scores of all entities. There are hundreds of thousands of entities. If we apply Hadamard operation for all tails, it will take up enormous resources. The paper simplified SFBR for tails. This simplification can effectively reduce resource occupation. Although the performance is sacrificed, there is still a certain improvement to basic models.

Experiment
This section is organized as follows. First, we introduce the experimental settings in Section 5.1.
Then, we show the effectiveness of SFBR on three benchmark datasets in Section 5.2. Finally, we visualize and analyze the embeddings generated by SFBR in Section 5.3.
FB15k-237 is obtained by eliminating the inverse and equal relations in FB15K, making it more difficult for simple models to do well. WN18RR is achieved by excluding inverse and equal relations in WN18. The main relation patterns are symmetry/antisymmetry and composition. YAGO3-10 is a subset of YAGO3, which is produced to alleviate the test set leakage problem.
Evaluation Settings. We use evaluation metrics standard across the link prediction literature: mean reciprocal rank (MRR) and Hits@k, k=1,3,10. Mean reciprocal rank is the average of the inverse of the mean rank assigned to the true triple over all candidate triples. Hits@k measures the percentage of times a true triple is ranked within the top k candidate triples. We evaluate the performance of link prediction in the filtered setting (Bordes et al., 2013), i.e., all known true triples are removed from the candidate set except for the current test triple. In both settings, higher MRR or higher Hits@1/3/10 indicate better performance.

Main Results
In this section, we compare the results of SFBR and other state-of-the-art models on three benchmark datasets. Table 2 shows the comparison between two SFBRs and geometric models. Compared with TransE, TransE-SFBR has significant improvements: on WN18RR, Hit@10 increases by 3.8%; on FB15k-237, Hit@10 increases by 7%. Compared with RotatE, RotatE-SFBR also makes significant progress: on WN18RR, Hit@10 increases by 2.2%; on the FB15k-237, Hit@10 increases by 2%.
The matrix multiplication performed by the MLP-based SFBR requires a lot of GPU memory. Limited by GPU resources, we only experiment on WN18RR, and the embedding dim of entities in TransE-SFBR (MLP) is only 100, which is 1/5 of the original. Therefore, the results of MLPbased SFBR cannot be contrasted with the other two SFBRs. Through comparative experiments on the two datasets, we find that the performance of SFBR based on Linear-2 is slightly better than that of SFBR based on Diag. Nevertheless, the extra parameters and the excessive resource occupancy of the Linear-2 are twice the Diag. In terms of resource utilization, we select Diag-based SFBR. The paper chooses Diag-based SFBR in the subsequent experiments by default. Table 3 shows the comparison between SFBR and the models based on tensor decomposition. SFBR improves the performance of the model on almost all datasets. On WN18RR, RESCAL-SFBR obtains the best result (the best Hit@10 is achieved by ComplEx-SFBR). On FB237, ComplEx-SFBR obtains the best result, and MRR is increased by 0.13. On YAGO3-10, although the performance of CP-SFBR and RESCAL-SFBR have been improved, they do not exceed ComplEx-DURA.
Overall, compared with the basic model, the performance on link prediction tasks has been improved by SFBR. Experiments on the standard dataset show that SFBR can improve the performance of base models.

Visualization and Analysis
In this part, we analyze the performance of SFBR from three aspects. First, we visualize the embedding through T-SNE; then, we randomly select a pair of samples to analyze the function of SFBR and show the additional resources occupied by SFBR.
Visualization. We use T-SNE to visualize tail entity embeddings. Suppose the link prediction task is (h, r, ?), where h and r are head entities and relations, respectively. We randomly select ten queries in FB15k-237, which have more than 50 answers. Then, we use T-SNE to visualize the embeddings generated by RotatE and RotatE-SFBER. For each question, T-SNE converts the answers into 2-dimensional points and displays them on the graph with the same color. As shown in Figure  4 and 5, it is a visualization of the distribution of answers to 10 questions. SFBR makes the answers to the same question more similar, indicating that SFBR effectively extracts the needed semantics of each entity and suppresses the attributes of other dimensions, which verifies the claim in Section 4.1.
Case study. Two pairs of triples are randomly selected from the test set for analysis. Each pair of triples has the same query: (h, r, ?) .For each query, a correct answer and an incorrect answer are randomly selected. The first pair of triples,     which cannot be predicted in TransE, can be distinguished by TransE-SFBER; for the other pair, both models can effectively predict them. Draw the distance of two triples: h + r − t 1 , where h, r, t are embeddings of entities and relations. Figure  6 and Figure 7 show the distance; the blue one is the triple deviation of the correct answer, the red one is the deviation of error triple. The top of the figure shows the distance of TransE, and the bottom is the distance of TransE-SFBE. From Figure 6, we can find that for the tail entity that cannot be predicted in TransE, TransE-SFBR suppresses the influence of irrelevant dimensions, and the tail entity can be predicted. For the tail entities, which TransE can predict in Figure 7, SFBR further suppresses the noise of other dimensions, and the distance between the correct and wrong tails is further enlarged, which enhances the model. Resource occupation. As shown in Table 4, the parameters of SFBR and other basic models on the three datasets are compared. The comparison finds that the parameter of SFBR only increases by 0.01 ∼ 0.5M on the model based on geometric distance; the parameter of SFBR only increases by 0.01 ∼ 1.9M on the model based on tensor decomposition. Especially in the geometric models, there is a small growth of parameters, and the performance can be significantly improved. In all cases, SFBR brings minimal growth in resource occupation to the basic model.

Conclusion
This paper designs a relation-based semantic filter-SFBR-for the geometric and tensor decomposition models based on knowledge graph completion. SFBR is based on the observation that judging the rationality of a particular triple is to compare specific attributes between the entities, ignoring other unrelated dimensions. Therefore, this paper provides a relation-based semantic filter to extract the attributes that need to be compared and suppress the irrelevant attributes of entities.
Experiments show that SFBR can effectively improve the performance of the traditional models, especially the geometric models. The visualization shows that SFBR can effectively extract the relevant dimensions and distinguish the comparisons among different attributes. Compared with the base models, SFBR only has a slight growth in resource occupation.