Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs

Knowledge graph entity typing (KGET) aims at inferring plausible types of entities in knowledge graphs. Existing approaches to KGET focus on how to better encode the knowledge provided by the neighbors and types of an entity into its representation. However, they ignore the semantic knowledge provided by the way in which types can be clustered together. In this paper, we propose a novel method called M ulti-view C ontrastive L earning for knowledge graph E ntity T yping ( MCLET ), which effectively encodes the coarse-grained knowledge provided by clusters into entity and type embeddings. MCLET is composed of three modules: i) Multi-view Generation and En-coder module, which encodes structured information from entity-type , entity-cluster and cluster-type views; ii) Cross-view Contrastive Learning module, which encourages different views to collaboratively improve view-specific representations of entities and types; iii) Entity Typing Prediction module, which integrates multi-head attention and a Mixture-of-Experts strategy to infer missing entity types. Extensive experiments show the strong performance of MCLET compared to the state-of-the-art.


Introduction
Knowledge graphs (KGs) (Pan et al., 2017a,b) store graph-like knowledge using triples of the form (s, r, o), indicating that entities s and o are related to each other through a relation type r.KGs also contain entity type knowledge described as (e, has_type, t), denoting that entity e has type t; e.g., we can express that Joe Biden has type Amer-ican_politician, cf. Figure 1.Entity type knowledge plays a key role in various natural language processing related tasks, such as entity and relation linking (Gupta et al., 2017;Pan et al., 2019), knowledge graph completion (Peng et al., 2022;Niu et al., 2022;Wiharja et al., 2020), question * answering (Hu et al., 2022b;Chen et al., 2019;Hu et al., 2023), and relation extraction (Li et al., 2019).However, one cannot always have access to this kind of knowledge as KGs are inevitably incomplete (Zhu et al., 2015).For example, in Figure 1 the entity Joe Biden has types American_politician and American_lawyer, but it should also have types male_politician and male_lawyer.This phenomenon is common in real-world datasets, for instance, 10% of entities in the FB15k KG have the type /music/artist, but are missing the type /people/person (Moon et al., 2017).Motivated by this, we concentrate on the Knowledge Graph Entity Typing (KGET) task which aims at inferring missing entity types in a KG.

Contact Authors
A wide variety of approaches to KGET have been already proposed, including embedding- (Moon et al., 2017;Zhao et al., 2020), transformer- (Wang et al., 2019;Hu et al., 2022a) and graph neural network (GNNs) based methods (Pan et al., 2021;Jin et al., 2022).Each of these approaches have their own disadvantages: embedding-based methods ignore the existing neighbor information, transformer-based are computationally costly due to the use of multiple transformers to encode different neighbors and GNNs-based ignore important higher-level semantic content beyond what is readily available in the graph-structure.Indeed, in addition to relational and type neighbors, entity types can often be clustered together to provide coarse-grained information.Importantly, this type of coarse-grain information is available in many KGs, such as YAGO (Suchanek et al., 2007), which provides an alignment between types and Word-Net (Miller, 1995) concepts.The example in Figure 1 shows that between the entity Joe Biden and the type American_politician, there is a layer with cluster-level information American and politician.On the one hand, compared with the type content, the cluster information has a coarser granularity, which can roughly give the possible attributes of an entity and reduce the decision-making space in the entity type prediction task.On the other hand, the introduction of cluster information can enhance the semantic richness of the input KG.For example, from the type assertion (Joe Biden, has_type, American_politician) and the clusters American and politician corresponding to type American_politician, we can obtain new semantic connection edges between entities and clusters, i.e., (Joe Biden, has_cluster, American) and (Joe Biden, has_cluster, politician).Note that as a type might belong to multiple clusters and a cluster may contain multiple types, a similar phenomenon occurs at the entity-type and entity-cluster levels, e.g, an entity might contain many clusters.Through the interconnection among entities, coarse-grained clusters and fine-grained types, a dense entity-cluster-type heterogeneous graph with multi-level semantic relations can be formed.
To effectively leverage the flow of knowledge between entities, clusters and types, we propose a novel method MCLET, a Multi-view Contrastive Learning model for knowledge graph Entity Typing, including three modules: a Multiview Generation and Encoder module, a Crossview Contrastive Learning module and a Entity Typing Prediction module.The Multi-view Generation and Encoder module aims to convert a heterogeneous graph into three homogeneous graphs entity-type, entity-cluster and cluster-type to encode structured knowledge at different levels of granularity (cf.right side of Figure 1).To collaboratively supervise the three graph views, the Crossview Contrastive Learning module captures the interaction by cross-view contrastive learning mechanism and mutually enhances the view-specific representations.After obtaining the embedding representations of entities and types, the Entity Typing Prediction module makes full use of the relational and known type neighbor information for entity type prediction.We also introduce a multi-head attention with Mixture-of-Experts (MoE) mechanism to obtain the final prediction score.Our main contributions are the following: • We propose MCLET, a method which effectively uses entity-type, entity-cluster and clustertype structured information.We design a crossview contrastive learning module to capture the interaction between different views.
• We devise a multi-head attention with Mixtureof-Experts mechanism to distinguish the contribution from different entity neighbors.
• We conduct empirical and ablation experiments on two widely used datasets, showing the superiority of MCLET over the existing state-of-art models.

Related Work
Embedding-based Methods.These methods have been introduced based on the observation that the KGET task can be seen as a sub-task of the completion task (KGC).ETE (Moon et al., 2017) unifies the KGET and KGC tasks by treating the entitytype pair (entity, type) as a triple of the form (entity, has_type, type).ConnectE (Zhao et al., 2020) builds two distinct type inference mechanisms with local typing information and triple knowledge.Graph Neural Network-based Methods.Given that GNNs inherently capture structural knowledge from graphs (e.g. the neighborhood of an entity), they have been previously used for the KGET task (Jin et al., 2019;Vashishth et al., 2020;Pan et al., 2021;Zhuo et al., 2022;Zhao et al., 2022;Zou et al., 2022a).For example, CET (Pan et al., 2021) introduces two mechanisms to fully utilize neighborhood information in an independent and aggregated manner.MiNer (Jin et al., 2022) proposes a neighborhood information aggregation module to aggregate both one-hop and multi-hop neighbors.However, these type of methods ignore other kind of semantic information, e.g., how types cluster together.Transformer-based Methods.Many studies use transformer (Vaswani et al., 2017) for KGs related tasks (Liu et al., 2022;Xie et al., 2022;Chen et al., 2022), including KGC.So, transformer-based methods for KGC, such as CoKE (Wang et al., 2019) and HittER (Chen et al., 2021) can be directly applied to the KGET task.TET (Hu et al., 2022a) presents a dedicated method for KGET.However, the introduction of multiple transformer structures brings a large computational overhead, which limits its application for large datasets.
3 Background Task Definition.Let E, R and T respectively be finite sets of entities, relation types and entity types.A knowledge graph (KG) G is the union of G triples and G types , where G triples denotes a set of triples of the form (s, r, o), with s, o ∈ E and r ∈ R, and G types denotes a set of pairs of the form (e, t), with e ∈ E and t ∈ T .To work with a uniform representation, we convert the pair (e, t) to the triple (e, has_type, t), where has_type is a special role type not occurring in R. Key to our approach is the information provided by relational and type neighbors.For an entity e, its relational neighbors is the set N r = {(r, o) | (e, r, o) ∈ G triples } and its type neighbors is the set N t = {(has_type, t) | (e, has_type, t) ∈ G types }.In this paper, we consider the knowledge graph entity typing (KGET) task which aims at inferring missing types from T in triples from G types .Type Knowledge Clustering.Before we introduce our approach to KGET, we start by noting that it is challenging to infer types whose prediction requires integrating various pieces of information together.For example, to predict that the entity Barack Obama has type 20th-century_American_lawyer, we need to know his birth year (Barack Obama, was_born_in, 1961), place of birth (Barack Obama, place_of_birth, Hawaii), and occupation (Barack Obama, occupation, lawyer).Clearly, this problem is exacerbated by the fact that the KG itself is incomplete, which might more easily lead to prediction errors.However, in practice, type knowledge is often semantically clustered together, e.g., the types male_lawyer, American_lawyer, and 19th-century_lawyer belong to the cluster lawyer.Naturally, this coarse-grained cluster information could help taming the decision-making process by paying more attention to types within a relevant cluster, without considering 'irrelevant' types from other clusters.With this in mind, we explore the intro-duction of cluster information into the type prediction process.Therefore, a natural question is how to determine the clusters to which a type belongs to.In fact, the Freebase (Bollacker et al., 2008) and the YAGO (Suchanek et al., 2007) datasets themselves provide cluster information.For the Freebase dataset, the types are annotated in a hierarchical manner, so we can directly obtain cluster information using a rule-like approach based on their type annotations.For instance, the type /location/uk_overseas_territory belongs to the cluster location and the type /education/educational_degree belongs to the cluster education.The YAGO dataset provides an alignment between types and WordNet concepts1 .So, we can directly obtain the words in WordNet (Miller, 1995) describing the cluster to which a type belongs to.For example, for the type wikicategory_People_from_Dungannon, its cluster is wordnet_person_100007846, and for the type wikicategory_Male_actors_from_Arizona, its cluster is wordnet_actor_109765278.

Multi-view Generation and Encoder
For the KGET task, the two parts, G triples and G types , of the input KG can be used for inference.The main question is how to make better use of the type graph G types , as this might affect the performance of the model to a large extent.So, the main motivation behind this component of MCLET is to effectively integrate the existing structured knowledge into the type graph.After introducing coarsegrained cluster information into the type graph, a three-level structure is generated: entity, coarsegrained cluster, and fine-grained type, such that the corresponding graph will have three types of edges: entity-type, cluster-type, and entity-cluster.Note that different subgraphs focus on different perspectives of knowledge.For example, the entity-cluster subgraph pays more attention to more abstract content than the entity-type subgraph.Therefore, to fully utilize the knowledge at each level, we convert the heterogeneous type graph into homogeneous graphs and construct an entity-type graph G e2t , a cluster-type graph G c2t , and a entity-cluster graph G e2c separately.Entity Type Graph.The entity type graph, denoted G e2t , is the original type graph G types from the input KG.Recall that different types of an entity can describe knowledge from different perspectives, which might help inferring missing types.For example, given the type assertion (Barack Obama, has_type, 20th-century_American_lawyer), we could deduce the missing type assertion (Barack Obama, has_type, American_lawyer), since the type 20th-century_American_lawyer entails Amer-ican_lawyer.
Cluster Type Graph.The cluster type graph, denoted G c2t , is a newly generated graph based on how types are clustered.Type knowledge available in existing KGs inherently contains semantic information about clusters of types.For instance, the type /people/appointer in FB15kET, clearly entails the cluster people.A similar phenomenon occurs in the YAGO43kET KG.Following this insight, for a type t and its cluster c, we use a new relation type is_cluster_of to connect t and c.For instance, from the type /people/appointer and its cluster people we can obtain (people, is_cluster_of, /people/appointer).
Note that a type may belong to multiple clusters.For example, the type American_lawyer, belongs to the clusters American and lawyer.
Entity Cluster Graph.The entity cluster graph, denoted as G e2c , is generated based on G e2t and G c2t .Unlike the entity type graph, the entity cluster graph captures knowledge at a higher level of abstraction.Therefore, its content has coarser granularity and wider coverage.So, given an entity e and a type t, for a triple (e, has_type, t) from G e2t and a triple (c, is_cluster_of, t) from G c2t , we construct a triple (e, has_cluster, c), where has_cluster is a new relation type.Note that because a type may belong to multiple clusters, an entity with this type will also be closely related to multiple clusters.Consider for an example, the entity Barack Obama with type Ameri-can_lawyer.Since American_lawyer belongs to the American and lawyer clusters, then there will be (Barack Obama, has_cluster, American) and (Barack Obama, has_cluster, lawyer) in G e2c .
Multi-view Encoder.We encode the different views provided by G e2t , G c2t , and G e2c into the representations of entities, types and clusters using graph convolutional networks (GCN) (Kipf and Welling, 2017).More precisely, we adopt Light-GCN's (He et al., 2020) message propagation strategy to encode the information propagation from entity-type, cluster-type, and entity-cluster views.
Our choice is supported by the following observation.The three graphs, G e2t , G c2t , and G e2c , are uni-relational, i.e., only one relational type is used, so there is no need a for a heavy multi-relational GCN model like RGCN (Schlichtkrull et al., 2018).Indeed, LightGCN is more efficient because it removes the self-connections from the graph and the nonlinear transformation from the information propagation function.To encode the three views, we use the same LightGCN structure, but no parameter sharing is performed between the correspond-ing structures.Taking the encoding of G e2t as an example, to learn the representations of entities and types, the ℓ-th layer's information propagation is defined as: where {x t←e2t } ∈ R d represent the embeddings of entity e and type t in the graph G e2t , and d is the dimension of embedding.{x t←e2t } are randomly initialized embeddings at the beginning of training.M e and M t respectively denote the set of all types connected with entity e and the set of all entities connected with type t.By stacking multiple graph propagation layers, high-order signal content can be properly captured.We further sum up the embedding information of different layers to get the final entity and type representation, defined as: where L indicates the number of layers of the Light-GCN.In the same way, we can get the type representation of cluster interaction x * t←c2t and the cluster representation of type interaction x * c←c2t from G c2t , and the entity representation of cluster interaction x * e←e2c and the cluster representation of entity interaction x * c←e2c from G e2c .

Cross-view Contrastive Learning
Different views can capture content at different levels of granularity.For example, the semantic content of x * e←e2c in G e2c is more coarse-grained than that of x * e←e2t in G e2t .To capture multi-grained information, we use cross-view contrastive learning (Zhu et al., 2021;Zou et al., 2022b;Ma et al., 2022) to obtain better discriminative embedding representations.For instance, taking the entity embedding as an example, for the embeddings x * e←e2c and x * e←e2t , our cross-view contrastive learning module goes through the following three steps: Step 1. Unified Representation.We perform two layers of multilayer perceptron (MLP) operations to unify the dimension from different views as follows: where {W 1 , W 2 } ∈ R d×d and {b 1 , b 2 } ∈ R d are the learnable parameters, f (•) is the ELU nonlinear function.The embedding of the i-th entity in G e2t2 can be expressed as z * e i ←e2t .
Step 2. Positive and Negative Samples.Let a node u be an anchor, the embeddings of the corresponding node u in two different views provide the positive samples, while the embeddings of other nodes in two different views are naturally regarded as negative samples.Negative samples come from two sources, intra-view nodes or inter-view nodes.Intra-view means that the negative samples are nodes different from u in the same view where node u is located, while inter-view means that the negative samples are nodes (different from u) in the other views where u is not located.
Step 3. Contrastive Learning Loss.We adopt cosine similarity θ(•, •) to measure the distance between two embeddings (Zhu et al., 2021).For example, take the nodes u i and v j in different views, we define the contrastive learning loss of the positive pair of embeddings (u i , v j ) as follows: where τ is a temperature parameter, H intra and H inter correspond to the intra-view and inter-view negative objective function.If u i is in G e2t and v j is in G e2c , then the positive pair embeddings (u i , v j ) represents (z * e i ←e2t , z * e j ←e2c ), i.e., the i-th node embedding in G e2t and the j-th node embedding in G e2c represent the same entity; after the contrastive learning operation, the corresponding node pair embedding becomes (z ⋄ e i ←e2t , z ⋄ e j ←e2c ).Considering that the two views are symmetrical, the loss of the other view can be defined as L(v j , u i ).The final loss function to obtain the embeddings is the mean value of all positive pairs loss:

Entity Typing Prediction
After performing the multi-view contrastive learning operation, we obtain two kinds of entity and type representation.These two representations incorporate entities, coarse-grained clusters and finegrained type knowledge at the same time.In this way, the cluster information is fully integrated into the representation of entities and types.For an entity e and type t, we obtain their final representation by respectively concatenating z ⋄ e←e2t and z ⋄ e←e2c , and z ⋄ t←e2t and z ⋄ t←c2t : Neighbor Prediction Mechanism.The entity and type embeddings are concatenated to obtain z = z e ||z t as the embedding dictionary to be used for the entity type prediction task.We found out that there is a strong relationship between the neighbors of an entity and its types.For a unified representation, we collectively refer to the relational and type neighbors of an entity as neighbors.Therefore, our goal is to find a way to effectively use the neighbors of an entity to predict its types.Since different neighbors have different effects on an entity, we propose a neighbor prediction mechanism so that each neighbor can perform type prediction independently.For an entity, its i-th neighbor can be expressed as (z i , r i ), where r i represents the relation embedding of the i-th relation.As previously observed (Pan et al., 2021), the embedding of a neighbor can be obtained using TransE (Bordes et al., 2013), we can then perform a nonlinear operation on it, and further send it to the linear layer to get its final embedding as follows: the learning parameters, and N represents the number of types.We define the embedding of all neighbors of entity e as follows: where n denotes the number of neighbors of e. Expert Selection Mechanism.Different neighbors of an entity contribute differently to the prediction of its types.Indeed, sometimes only few neighbors are helpful for the prediction.We introduce a Multi-Head Attention mechanism (Zhu and Wu, 2021;Jin et al., 2022) with a Mixture-of-Experts (MHAM) to distinguish the information of each head.We compute the final score as: where and b 2 ∈ R H are the learnable parameters.M and H respectively represent the number of experts in Mixture-of-Experts and the number of heads.ϕ and σ represent the softmax and sigmoid activation functions respectively.T i > 0 is the temperature controlling the sharpness of scores.Prediction and Optimization.We jointly train the multi-view contrastive learning and entity type prediction tasks to obtain an end-to-end model.For entity type prediction, we adopt the false-negative aware (FNA) loss function (Pan et al., 2021;Jin et al., 2022), denoted L ET .We further combine the multi-view contrastive learning loss with the FNA loss, so we can obtain the joint loss function: where β is a hyper-parameter used to control the overall weight of negative samples, λ and γ are hyper-parameters used to control the contrastive loss and L 2 regularization, and Θ is the model parameter set.Datasets.We evaluate our MCLET model on two knowledge graphs, each composed of G triples and G types .For G triples , we use the FB15k (Bordes et al., 2013) and YAGO43k (Moon et al., 2017).
For G types , we use the FB15kET and YAGO43kET datasets introduced by (Pan et al., 2021), which map entities from FB15k and YAGO43k to corresponding entity types.The statistics of the corresponding datasets are shown in Table 1.
Evaluation Protocol.For every pair (e, t) in the test set, we obtain a ranking list for the possible types t.We choose five automatic evaluation metrics: mean rank (MR), mean reciprocal rank (MRR), and Hits@k (k∈ {1, 3, 10}), MR measures the average positions of the first correct answer in a list of ranked results, MRR defines the inverse of the rank for the first correct answer, Hits@k calculates the percentage of correct types ranked among the top-k, in addition to the MR metric, the larger the value, the better the effect.Following the evaluation protocol in most entity typing works (Pan et al., 2021;Hu et al., 2022a;Jin et al., 2022), all metrics are reported under the filtered setting (Bordes et al., 2013).

Main Results
The empirical results on entity type prediction are reported in Table 2.We can see that all MCLET variants outperform existing SoTA baselines by a large margin across all metrics.In particular, compared to MiNer (the best performing baseline), our MCLET-MHAM respectively achieves 2.2% and 2.1% improvements on MRR in the FB15kET and YAGO43kET datasets.We can see a similar improvement e.g. on the Hits@1 metric, with an increase of 2.3% and 2.4% on FB15kET and YAGO43kET, respectively.Our ablation studies below show the contribution of MCLET's components on the obtained improvements.
We have evaluated three variants of MCLET to explore the effectiveness of the expert selection mechanism: MCLET-Pool, MCLET-MHA, and MCLET-MHAM.MCLET-Pool and MCLET-MHA respectively replace our expert selection mechanism with the pooling approach introduced in CET and the type probability prediction module introduced in MiNer.We observe that the MHAM variant achieves the best results.For instance, on FB15kET, MHAM improves 2.4% and 0.6% over the Pool and MHA variants on the MRR metric.This can be intuitively explained by the fact that the neighbors of an entity have different contributions to the prediction of its types.Indeed, by using the expert selection strategy, the information obtained by each head can be better distinguished.As a consequence, a more accurate final score can be

FB15kET
YAGO43kET Setting MRR MR Hits@1 Hits@3 Hits@10 MRR MR Hits@1 Hits@3 Hits@10 w/o e2t 0 obtained based on the prediction scores of each of the neighbors.

Ablation Studies
To understand the effect of each of MCLET's components on the performance, we carry out ablation experiments under various conditions.These include the following three aspects: a) the content of different views, see Table 3; b) different LightGCN layers, see Table 4; c) different dropping rates, see Table 5.Other ablation results and a complexity analysis can be found in Appendix B and C.
Effect of Different Views.We observe that removing any of the views most of the time will result in a decrease in performance, cf.Table 3. Further, if all three views are removed there will be a substantial performance drop in both datasets.For instance, the removal of all three views brings a decrease of 7.3% of the MRR metric on both datasets.This strongly indicates that the introduction of the three views is necessary.Intuitively, this is explained by the fact that each view focuses on a different level of granularity of information.Using the crossview contrastive learning, we can then incorporate different levels of knowledge into entity and type embeddings.We can also observe that the performance loss caused by removing the cluster-type view is much lower than that caused by removing entity-type and entity-cluster views.This is mainly because the cluster-type graph is smaller and denser, so the difference in the discriminative features of nodes is not significant.
Effect of Different LightGCN Layers.We have observed that the number of layers on the FB15kET and YAGO43kET datasets has a direct effect on the performance of LightGCN, cf.Table 4.For FB15kET, the impact of the number of GCN layers on the performance is relatively small.However, for YAGO43kET, the performance sharply declines as the number of layers increase.The main reason for this phenomenon is that in the YAGO43kET dataset most entities have a relatively small number of types.As a consequence, the entity-type and entity-cluster views are sparse.So, when deeper graph convolution operations are applied, multihop information is integrated into the embeddings through sparse connections.As a consequence, noise is also introduced, which has a negative impact on the final results.We further constructed a dataset where each entity contains between 1 to 4 type neighbors and performed experiments with LightGCN with layer numbers ranging from 1 to 4.
In this case, we can observe that as the number of GCN layers increase, there is a significant decline in the performance on FB15kET as well.This is in line with the finding that the performance decreases in YAGO43kET as the number of layers increases.
Effect of Dropping Rates of Relation Neighbors.Relational neighbors of an entity provide supporting facts for its representation.To verify the robustness of MCLET in scenarios where relational neighbors are relatively sparse, we conduct an ablation experiment on FB15kET by randomly removing 25%, 50%, 75%, and 90% of the relational neighbors of entities, as proposed in (Hu et al., 2022a).We note that even after removing different proportions of relational neighbors, MCLET still achieves optimal performance.This can be mainly explained by two reasons.On the one hand, our views are based solely on the entity type neighbors, without involving the entity relational neighbors.Thus, changes in relational neighbors do not significantly affect the performance of MCLET.On the other hand, relational neighbors only serve as auxiliary knowledge for entity type inference, while the existing type neighbors of entities play a decisive role in predicting the missing types of entities.
Effect of Dropping Rates of Relation Types.
Compared with YAGO43kET, FB15kET has much more relations.To verify the robustness of MCLET when the number of relations is small, similar to (Hu et al., 2022a), we randomly remove 25%, 50%, 75%, and 90% of the relation types in FB15kET.From Table 6, we can observe that even with a smaller number of relation types, MCLET still achieves the best performance.This demonstrates the robustness of our method in the presence of significant variations in the number of relational neighbors.This is mainly due to the introduction of cluster information, which establishes a coarsegrained bridge between entities and types, this information is not affected by drastic changes in the structure of the knowledge graph.Therefore, the incorporation of cluster information is necessary for entity type prediction tasks.

Conclusions
We propose MCLET, a multi-view contrastive learning (CL) framework for KGET.We design three different views with different granularities, and use a CL strategy to achieve cross-view cooperatively interaction.By introducing multi-head attention with a Mixture-of-Experts mechanism, we can combine different neighbor prediction scores.
For future work, we plan to investigate inductive scenarios, dealing with unseen entities and types.

Limitations
In this paper, we introduce coarse-grained cluster content for the knowledge graph entity typing task.Although we achieve good results, there are still limitations in the following aspects: 1) For the standard benchmark datasets we use the readily available cluster-level annotation information.However, for those datasets without cluster information, we would need to use clustering algorithms to construct implicit cluster semantic structures.2) There is a related task named fine-grained entity prediction (FET), the difference lies in predicting the types of entities that are mentioned in a given sentence, rather than entities present in a knowledge graph.The corresponding benchmarks also have annotated coarse-grained cluster information.Therefore, it would be worthwhile exploring the transferability of MCLET to the FET task.

Figure 2 :
Figure 2: An overview of our MCLET model, containing three modules: Multi-view Generation and Encoder, Cross-view Contrastive Learning, and Entity Typing Prediction.

Figure 3 :
Figure 3: The ablation studies results under different experimental conditions.

Table 1 :
Statistics of Datasets.

Table 2 :
Main evaluation results.♢ results are from the original papers.♦ results are from our implementation of the corresponding models.Best scores are highlighted in bold, the second best scores are underlined.

Table 3 :
Evaluation of ablation experiments with different views on FB15kET and YAGO43kET.Best scores are highlighted in bold.

Table 4 :
Evaluation of ablation experiments with different LightGCN layers on FB15kET and YAGO43kET, where {all} indicates the complete dataset, and {1~4} indicates that the entities in the dataset only contain 1 to 4 type neighbors.Best scores are highlighted in bold.

Table 6 :
Evaluation with different relation types dropping rates on FB15kET.H@N is an abbreviation for Hits@N, N ∈ {1, 3}.Best scores are highlighted in bold.