Transformer-based Entity Typing in Knowledge Graphs

We investigate the knowledge graph entity typing task which aims at inferring plausible entity types. In this paper, we propose a novel Transformer-based Entity Typing (TET) approach, effectively encoding the content of neighbours of an entity by means of a transformer mechanism. More precisely, TET is composed of three different mechanisms: a local transformer allowing to infer missing entity types by independently encoding the information provided by each of its neighbours; a global transformer aggregating the information of all neighbours of an entity into a single long sequence to reason about more complex entity types; and a context transformer integrating neighbours content in a differentiated way through information exchange between neighbour pairs, while preserving the graph structure. Furthermore, TET uses information about class membership of types to semantically strengthen the representation of an entity. Experiments on two real-world datasets demonstrate the superior performance of TET compared to the state-of-the-art.


Introduction
A knowledge graph (KG) (Pan et al., 2016) is a multi-relational graph encoding factual knowledge, with the form (h, r, t) where h, t are the head and tail entities connected via the relation r.In this paper, we consider KGs with minimal schema information, i.e., those containing entity type assertions, as the only schema information, of the form (e, has_type, c) stating that the entity e has type c; e.g., to capture that Barack Obama has type President.Entity type knowledge is widely used in NLP tasks, e.g., in relation extraction (Liu et al., 2014), entity and relation linking (Gupta et al., 2017;Pan et al., 2019), question answering (ElSahar et al., 2018;Hu et al., 2022), and finegrained entity typing on text (Onoe et al., 2021;Qian et al., 2021;Liu et al., 2021).However, entity types are far from complete, since in real-world applications they are continuously emerging.For example, about 10% of entities in FB15k (Bordes et al., 2013) have the type /music/artist, but do not have /people/person (Moon et al., 2017).
In light of this, it has been recently investigated the Knowledge Graph Entity Typing (KGET) task, aiming at inferring missing entity types in a KG.Most existing approaches to KGET use methods based on either embeddings or graph convolutional networks (GCN).Despite the huge progress these methods have made, there are still some important challenges to be solved.On the one hand, most embedding-based models (Moon et al., 2017;Zhao et al., 2020;Ge et al., 2021;Zhuo et al., 2022) encode all neighbors of a target entity into a single vector, but in many cases only some neighbors are necessary to infer the correct types.For example, as shown in Figure 1, to predict that the entity Barack Obama has type President, only the neighbor is_leader_of −−−−−−→ U.S is needed.Indeed, using too many neighbors, such as graduate_from − −−−−−−− → Columbia University, will introduce noise.The CET model (Pan et al., 2021) overcomes this problem by encoding each neighbor independently.However, since entities and relations are repre-sented by TransE (Bordes et al., 2013), there is a restriction on the direction of the representation of entities and relations direction, fixing it from entity to relation or vice versa.As a consequence, certain interactions between neighbor entities and relations are ignored.Also, to predict more complex types, CET directly adds and averages the neighbor representations, weakening the contribution of different neighbors, since it ignores that the contribution of different neighbors to different types might not be the same.For example, as shown in Figure 1, the inference of the type 20th-century American writer involves multiple semantic aspects of Barack Obama, it requires to jointly consider the neighbors On the other hand, GCN frameworks for KGET use expressive representations for entities and relations based on their neighbor entities and relations (Jin et al., 2019;Zhao et al., 2022;Zou et al., 2022;Vashishth et al., 2020;Pan et al., 2021).However, a common problem of GCN-based models is that they aggregate information only along the paths starting from neighbors of the target entity, limiting the representation of interdependence between neighbors that are not directly connected.For example, in Figure 1 the entities Juris Doctor and U.S are not connected, but combining their information could help to infer that American Legal Scholars is a type of Barack Obama.This could be fixed by increasing the number of layers, but with an additional computational cost.
The main objective of this paper is to introduce a transformer-based approach to KGET that addresses the highlighted challenges.The transformer architecture (Vaswani et al., 2017) has been essential for NLP, e.g., in pre-trained language models (Devlin et al., 2019;Reimers and Gurevych, 2019;Lan et al., 2020;Wu et al., 2021a), document modeling (Wu et al., 2021b), and link prediction (Wang et al., 2019;Chen et al., 2021).Transformers are well-suited for KGET as entities and relations in a KG can be regarded as tokens, and using the transformer as encoder, one can thus achieve bidirectional deep interaction between entities and relations.Specifically, we propose TET, a Transformer-based Entity Typing model for KGET, composed of the following three inference modules.A local transformer that independently encodes the relational and type neighbors of an entity into a sequence, facilitating bidirectional interaction between elements within the sequence, addressing the first problem.A global transformer that aggregates all neighbors of an entity into a single long sequence to simultaneously consider multiple attributes of an entity, allowing to infer more 'complex' types, thus addressing the third problem.A context transformer that aggregates neighbors of an entity in a differentiated manner according to their contribution while preserving the graph structure, thus addressing the second problem.Furthermore, we use semantic knowledge about the known types in a KG.In particular, we find out that types are normally clustered in classes.For example, the types medicine/disease, medicine/symptom, and medicine/drug belong to the class medicine.
We use this class membership information for replacing the 'generic' relation has_type with a more fine-grained relation that captures to which class a type belongs to, enriching the semantic content of connections between entities and types.To sum up, our contributions are: • We propose a novel transformer-based framework for inferring missing entity types in KGs, encoding knowledge about entity neighbors from three different perspectives.
• We use class membership of types to replace the single has_type relation with class-membership relations providing fine-grained semantic information.
• We conduct empirical and ablation experiments on two real-world datasets, demonstrating the superiority of TET over existing SoTA models.
Data, code, and an extended version with appendix are available at https://github.com/zhiweihu1103/ET-TET.

Related Work
The knowledge graph completion (KGC) task is usually concerned with predicting the missing head or tail entities of a triple.KGET can thus be seen as a specialization of KGC.Existing KGET methods can be classified in embedding-and GNC-based.
Embedding-based Methods.ETE (Moon et al., 2017) learns entity embeddings for KGs by a standard representation learning method (Bordes et al., 2013), and further builds a mechanism for information exchange between entities and their types.
ConnectE (Zhao et al., 2020) jointly embeds entities and types into two different spaces and learns a mapping from the entity space to the type space.CORE (Ge et al., 2021) utilizes the models Ro-tatE (Sun et al., 2019) and ComplEx (Trouillon et al., 2016) to embed entities and types into two different complex spaces, and develops a regression model to link them.However, the above methods do not fully consider the known types of entities while training the entity embedding representation, which seriously affects the prediction performance of missing types.Also, the representation of types in these methods is such that they cannot be semantically differentiated.CET (Pan et al., 2021) jointly utilizes information about existing type assertions in a KG and about the neighborhood of entities by respectively employing an independentbased mechanism and an aggregated-based one.It also utilizes a pooling method to aggregate their inference results.AttEt (Zhuo et al., 2022) designs an attention mechanism to aggregate the neighborhood knowledge of an entity using type-specific weights, which are beneficial to capture specific characteristics of different types.A shortcoming of these two methods is that, unlike our TET model, they are not able to cluster types in classes, and are thus not able to semantically differentiate them in a fine-grained way.
GCN-based Methods.Graph Convolutional Networks (GCNs) have proven effective on modeling graph structures (Kipf and Welling, 2017;Hamilton et al., 2017;Dettmers et al., 2018).However, directly using GCNs on KGs usually leads to poor performance since KGs have different kinds of entities and relations.To address this problem, RGCN (Schlichtkrull et al., 2018) proposes to apply relation-specific transformations in GCN's aggregation.HMGCN (Jin et al., 2019) proposes a hierarchical multi-graph convolutional network to embed multiple kinds of semantic correlations between entities.CompGCN (Vashishth et al., 2020) uses composition operators from KG-embedding methods by jointly embedding both entities and relations in a relational graph.ConnectE-MRGAT (Zhao et al., 2022) proposes a multiplex relational graph attention network to learn on heterogeneous relational graphs, and then utilizes the ConnectE method for infering entity types.RACE2T (Zou et al., 2022) introduces a relational graph attention network method, utilizing the neighborhood and relation information of an entity for type inference.
A common problem with these methods is that they follow a simple single-layer attention formulation, restricting the information transfer between unconnected neighbors of an entity.
Transformer-based Methods.To the best of our knowledge, there are no transformer-based approaches to KGET.However, two transformerbased frameworks for the KGC task have been already proposed: CoKE (Wang et al., 2019) and HittER (Chen et al., 2021).Our experiments show that they are not suitable for KGET.

Method
In this section, we describe the architecture of our TET model (cf. Figure 2).We start by introducing necessary background (Sec.3.1), then present in detail the architecture of TET (Sec.3.2).Finally, we describe pooling and optimization strategies (Sec.3.3 and 3.4).

Background
In this paper, a knowledge graph (Pan et al., 2016) is represented in a standard format for graphstructured data such as RDF (Pan, 2009) , where E is a set of entities, C is a set of entity types, R is a set of relation types, and T is a set of triples.Triples in T are either relation assertions (h, r, t) where h, t ∈ E are respectively the head and tail entities of the triple, and r ∈ R is the edge of the triple connecting head and tail; or entity type assertions (e, has_type, c), where e ∈ E, c ∈ C, and has_type is the instance-of relation.For e ∈ E, the relational neighbors of e is the set {(r, f ) | (e, r, f ) ∈ T }.The type neighbors of e are defined as {(has_type, c) | (e, has_type, c) ∈ T }.We will simply say neighbors of e when we refer to the relational and type neighbors of e.The goal of this paper is to address KGET task which aims at inferring missing types from C in entity type assertions.

Model Architecture
In this section, we introduce the local, global and context transformer-based modeling components of our TET model.Before defining these components, we start by discussing an important observation.

Class Membership
A key observation is that in a KG all type assertions are uniformly defined using the relation has_type.

Global Transformer
[CLS] As a consequence, we do not have a way to fully differentiate the contribution of different types of an entity during inference, as we cannot capture the relationship between them and their relevance, weakening thus the contribution of type supervision on entities.However, in practice types are clustered together in classes (i.e., root types in a domain); e.g., the types medicine/disease, medicine/symptom, and medicine/drug belong to the class medicine.This allows us to identify that these types are related as all of them talk about something related to medicine, providing us therefore with fine-grained semantic information.With this insight in mind, for each class, we create a relation that will be used to model that a type is an element of that class.For instance, for the class medicine, we introduce the relation belongs_class_medicine .We will then replace a type neighbor (has_type, c) of an entity e with (r class , c), where r class is the relation modeling class membership, i.e. belonging to the class medicine.We define the type-class neighbors of an entity as expected.We will use below this semantically-enriched representation in our local and global transformers.

Local Transformer
The main intuition behind the local component is that the neighbors of an entity might help to determine its types, and that the contribution of each neighbor is different.For instance, if the entity Liverpool has the relational neighbor (places_lived, Daniel Craig), it is plausible to infer Liverpool has type /location/citytown.On the other hand, the neighbor (sports_team, Liverpool F.C.) may help to infer that it has type /sports/sports_team_location.To encode type-class neighbors (r class , c), similar to the input representations of BERT (Devlin et al., 2019), we build the input sequence H = ([CLS], r class , c), where [CLS] is a special token, and for each element h i in H, we construct its input vector representation h i as: Type-class neighbors are not capable to fully capture the structural information within the KG.To alleviate this problem, we also consider relational neighbors.As for type-class neighbors, to encode the relational neighbors (r, f ) of an entity, we build a sequence Q = ([CLS], r, f ), and aggregate the word and position embeddings and further apply a local transformer.The output embedding of [CLS] is denoted as Q cls ∈ R d×1 , and for an entity with m relational neighbors, they are represented as after the local transformer representation.
The local transformer mainly pays attention to a single existing neighbor at a time in the inference process, reducing the interference between unrelated types.We perform a non-linear activation on neighbors, and then perform a linear layer operation to unify the dimension to the number of types, the final local transformer score S loc ∈ R L×(m+n)  is defined as: An important observation is that the number of relations available vary from one KG to another.For instance, the YAGO43kET KG has substantially fewer relations than the FB15kET KG (cf. the dataset statistics in the Experiments Section), making the discrimination among relations in relational triples harder.To tackle this problem, for the YAGO43kET KG, we semantically enrich the representation of relations by using the type-class membership information.Specifically, for a relational neighbor (r, f ) of an entity, we use the types of f belonging to a certain class to enhance the relation r in the sequence ([CLS], r, f ) using the following steps: 1. Let Γ = {(has_type, c 1 ), (has_type, c 2 ), . . ., (has_type, c ℓ )} be the set of all type neighbors of f .We replace Γ with the set Γ ′ of corresponding type-class neighbors: {(r class 1 , c 1 ), (r class 2 , c 2 ), . . ., ((r class ℓ , c ℓ )}, i.e., representing that c i is a member of class i .
For each element p i of P , we assign randomly initialized word and position embeddings to capture sequence order.We then apply a transformer to capture the interaction between tokens.The output token embeddings are denoted as [p 0 , p 1 , . . ., p ℓ ].
3. For the output token embeddings, we use three different operations to obtain the final representation of relation r: average, maximum, and minimum.For the YAGO43kET KG, we replace the word embedding r in sequence Q with P avg = ℓ i=0 p i , P max = Max(p i ), or P min = Min(p i ).

Global Transformer
The local transformer mechanism is suitable for types that can be inferred by looking at simple structures, and for which independently considering neighbors is thus enough.However, inferring 'complex' types requires to capture the interaction between different neighbors of an entity.For instance, if we would like to infer that the entity Birmingham_City_L.F.C. has type Women's_football_clubs_in_England, we need to simultaneously consider different sources of information to support this, such as the type neighbor (has_type, Association_football_clubs) and relational neighbor (isLocatedIn, England) of Birm-ingham_City_L.F.C., and that (playsFor, Birming-ham_City_L.F.C.) and (hasGender, female) are relational neighbors of the entity Darla_Hood.To this aim, we introduce a global transformer module capturing the interaction between type-class and relational neighbors by comprehensively representing them as the input of a transformer as follows: 1.For a target entity e, we define the set Γ ′ as done in Section 3.2.2.Further, let Ξ = {(r 1 , f 1 ), . . ., (r m , f m )} denote the set of all relational neighbors of e.
2. We uniformly represent Γ ′ and Ξ as a single sequence 3. For each element in the sequence G, we assign randomly initialized word and position embeddings, and input it into a transformer.The output embedding of [CLS] is denoted G cls ∈ R d×1 .Similar to Equation (1), we define the prediction score S glo ∈ R L×1 as WRelu([G cls ]) + b.

Context Transformer
For complex types, the global transformer uniformly serializes the information about the neighbors of the target entity.However, the neighbors of the target entity are pairs, and this structural information might be useful for inference.For instance, to infer that the entity Barack Obama has type 20th-century American writers, we need to consider different aspects of its relational neighbors, e.g., the neighbor (bornIn, Chicago) focuses on the birthplace, while the neighbor (write, A Promised Land) is concerned with possible careers.The global transformer serialization of pairs as a sequence may lead to two problems: First, serializing neighbors disregards the structure of the graph.Second, the importance of each element in the sequence is the same, and even elements that are not relevant for the inference will exchange information, e.g., bornIn and A Promised Land in the example above.To realize a differentiated aggregation between different neighbor pairs while preserving the graph structure, we use a context transformer module as in (Chen et al., 2021).Intuitively, given the output of the local transformer and the [CLS] embedding, the context transformer contextualizes the target entity with type-class and relational neighbors knowledge from its neighborhood graph, details of the context transformer can be found in (Chen et al., 2021).The output embedding of [CLS], denoted as C cls ∈ R d×1 , is used for the final entity type prediction, which is defined as S ctx = WRelu([C cls ]) + b, where S ctx ∈ R L×1 .

Pooling
For an entity e, the local, global, and context transformers may generate multiple entity typing inference results.To address this, we adopt an exponentially weighted pooling method to aggregate prediction results (Pan et al., 2021;Stergiou et al., 2021), formulated as follows: S e = pool({S loc 0 , S loc 1 , ..., S loc m+n−1 , S glo , S ctx }) S e ∈ R L represents the relevance score between e and its types, and n (m) is the number of type-class (relational) neighbors of e respectively.For simplicity, we will omit the identifiers (loc,glo,ctx).We unify the numerical order of the output results of the local, global, and context transformers as follows: S e =pool({S 0 , S 1 , ..., S m+n−1 , S m+n , S m+n+1 }) We further apply a sigmoid function to S e , denoted as s e = σ(S e ), to map the scores between 0 and 1, where the higher the value of s e,k of s e , the more likely is e to have type k.

Optimization Strategy
To train a model with positive sample score s e,k (representing that (e, has_type, k) exists in a KG) and negative sample score s ′ e,k (representing that (e, has_type, k) does not exist in KG), usually binary cross-entropy (BCE) is used as the loss function.However, there may exist a serious false negative problem, i.e., some (e, has_type, k) are valid, but they are missing in existing KGs.To overcome this problem, false-negative aware loss functions (FNA) have been proposed (Pan et al., 2021).Basically, they assign lower weight to negative samples with too high or too low relevance scores.We introduce a steeper false-negative aware (SFNA) loss function which gives more penalties to negative samples with too high or too low relevance scores.The negative sample score is defined as: For the positive score s e,k and negative score s ′ e,k , the SFNA loss is defined as follows: 4 Experiments In this section, we discuss the evaluation of TET relative to twelve baselines on a wide array of entity typing benchmarks.We first describe datasets and baseline models (Sec.4.1).Then we discuss the experimental results (Sec.4.2).Finally, we present ablation study experiments (Sec.4.3).

Datasets and Baselines
Datasets.We evaluate our proposed TET model on two real-world knowledge graphs: FB15k (Bordes et al., 2013) and YAGO43k (Moon et al., 2017) which are the subgraphs of Freebase (Bollacker et al., 2008) and YAGO (Suchanek et al., 2007), respectively.FB15kET and YAGO43kET provide entity type instances which map entities from FB15k and YAGO43k to corresponding entity types.For fairness of the experimental comparison, we followed the standard train/test split as in the baselines.The basic statistics of all datasets are shown in Table 2.

Experimental Results
Table 1 presents the evaluation results of entity type prediction on FB15kET and YAGO43kET.We Datasets FB15kET YAGO43kET Metrics MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10 Embedding-based methods ETE (Moon et al.,   can observe that our model TET outperforms all baselines in terms of basically all metrics.These results demonstrate that transformers more effectively encode the neighbor information of an entity.Specifically, when using the BCE and FNA loss functions, TET meets or exceeds the CET model (the best performing baseline).By using the SFNA loss function, we can get further performance improvement, especially in the MRR and Hit@1 metrics on FB15kET.Furthermore, TET has different gains compared to CET with respect to the Hit metrics.The improvement on Hit@1 is higher than on Hit@3 and Hit@10 because by using three different transformer modules TET can encode the neighborhood information of an entity at three different levels of granularity.Further, if we do not use type-class neighbors and for the YAGO43kET dataset the type-class enrichment on relations is not present (TET-SNFA-no-class), we note that the performance of TET on the YAGO43kET dataset decreases considerably.Intuitively, the decrease on the YAGO43kE is larger than on FB15k because the graph structure of YAGO43k is sparser, has fewer relations, and a large number of types, making the semantic type-class knowledge crucial.

Ablation Studies
To verify the impact of each TET model component on the performance, we conduct ablation studies on FB15kET and YAGO43kET.In particular we look at the effect of: a) different transformer modules, Table 3; b) different neighbor content, Table 4; c) different integration methods on YAGO43kET, Table 5; d) different dropping rates, Table 6; e) the number of hops, Table 7.
Effect of Transformer.The local transformer by itself performs better than the global one by itself.This indicates that considering independently the neighbors of an entity can reduce interference be- achieves almost the same result as when the context one is also incorporated.We believe that in datasets with a more complex structure the context transformer could play a more prominent role, we leave this line of research as future work.
Effect of Neighbor Content.We observe that the impact of relational neighbors is greater than that of type-class neighbors.Indeed, removing relational neighbors leads to a substantial performance degradation in YAGO43kET.When both of them are available, type-class neighbors might help relational ones to distinguish between relevant and irrelevant types for an inference.Effect of Dropping Rates.In real life KGs, many entities have sparse relations with other entities.In particular, they have few relational neighbors but a large number of types, so for their inference we lack structural relational information.Indeed, in YAGO43kET about 4.73% of its entities have five times more types than relational neighbors (Zhuo et al., 2022).To further test the robustness of TET under relation-sparsity, we also conduct ablation experiments on FB15kET by ran-domly removing 25%, 50%, 75%, and 90% of the relational neighbors of entities.We find that with the continuous increase of the sparsity ratio, the performance of baselines decrease to varying degrees, but TET still achieves the best results under all sparsity conditions.We also consider TET with semantic enhancement on relations since by randomly dropping neighbors the number of relations might also be reduced.However, not enough relations are removed to have a positive effect.Another reason for not having positive effect is that the number of types in FB15kET is substantially smaller than in YAGO43kET.Table 6 shows results for 75%, and 90%, for missing results see appendix.Effect of Number of Hops.For relational neighbors, TET only considers one-hop information i.e., only the information around their direct neighbors.
We also conduct an ablation study on the effect of using different number of hops.In principle multi-hop information could provide richer structural knowledge, increasing the discrimination of relational neighbors.Indeed, a positive effect of multi-hop information has been witnessed in several approaches to KGC.However, our experimental results show that the noise introduced by intermediate entities is more dominant than the additional knowledge n-hop entities and relations provide.Intuitively, for KGC multi-hop information makes a difference as it exploits the topological structure of the KG (i.e.how entities are related).However, in the input KG, types are not related between them and as our experiments show, one can not lift the topological structure at the entitylevel to the type one, explaining why there is no gain from considering multi-hop information.It would interesting to confirm this observation by using GCNs, which more naturally capture multi-hop information.

Conclusions
In this paper, we propose a novel transformer-based model for KGET which utilizes contextual information of entities to infer missing types for KGs with minimal schema information.TET has three modules allowing to encode local and global neighborhood information from different perspectives.We also enhance the representation of entities by using class membership knowledge of types.We experimentally showed the benefits of our model.

Limitations
Our TET model currently suffers from two limitations.From the methodological viewpoint, a transformer mechanism introduces more parameters than embedding-based methods, bringing some computational burden and memory overhead, but they are tolerable.Also, there exist other important tasks related to types, e.g.fine-grained entity typing, aiming at classifying entity mentions into fine-grained semantic labels.TET is currently not appropriate for this kind of tasks.
conduct an ablation study in which on FB15kET we randomly remove 25%, 50%, 75%, and 90% of the relation types.The results in Table 10 show that in this case enhancing relation types with semantic knowledge does not have a positive effect, unlike for YAGO43kE.We believe that the main reason behind this is that YAGO43kE does not only have very few relations, but also a very large number of types.To have a precise understanding, a dedicated deep analysis of the interplay of the number of types, the number of relation types, and other structural characteristics of KGs is required -it is out of the scope of this paper, but it is an interesting question for future work.

Figure 1 :
Figure 1: A KG with its entity type information.

Figure 2 :
Figure 2: An overview of the TET model.The red dotted box part is only performed on the YAGO43kET dataset.Note that r c−i is an abbreviation of r classi .Box with Local text indicates the output of the local transformer module.
word and position embeddings of r class or c.We apply a local transformer to each type-class neighbor sequence to model the interaction between the class relations and types of an entity.The output embedding corresponding to [CLS], denoted as H cls ∈ R d×1 , is then used to infer missing types of the target entity, where d represents the dimension of the embedding.For an entity with n type-class neighbors, they are denoted as [H cls 1 , H cls 2 , ..., H cls n ] after the local transformer representation.
L×d and b ∈ R L are the learnable parameters, where L is the number of types.[, ] denotes the concatenation function, H cls i ∈ R d×1 and Q cls j ∈ R d×1 respectively represent the i-th and j-th embedding of the type-class and relational neighbors after the transformer representation.

Table 1 :
Evaluation of different models on FB15kET and YAGO43kET.♢ results are from the original papers.♦ results are from our implementation of the corresponding models.TET-SFNA-no-class means that type-class neighbors were not used, and for YAGO43kET in addition no semantic enhancement on relations is used.

Table 2 :
Statistics of Datasets.

Table 3 :
Evaluation of ablation study with different transformer modules combinations.tween unrelated types.By combining the global and context transformer, more complex types can be inferred from the token and graph structure level, achieving state-of-the-art results.Note that both the global and context transformers deal with complex types, but the context one further takes into account the relevance of different neighbors while preservin the structure of the KG.As one can see from the results, for the used datasets, the global transformer is already doing most of the work, i.e., the combination of local and global transformers

Table 5 :
Evaluation of ablation study with different integration methods.Note that, "No" means without performing type-class semantic enhancement on the relations.problem,in Section 3.2.2,we have enriched the representations of relations in relational neighbors with type-class knowledge.One can observe that the Avg operation outperforms Min and Max because the latter tend to discard useful content.

Table 6 :
Evaluation with different dropping rates on FB15kET.TET_RSE represents TET with semantic enhancement on relations.

Table 7 :
Evaluation of ablation study with different number of hops on FB15kET.