OntoEA: Ontology-guided Entity Alignment via Joint Knowledge Graph Embedding

Semantic embedding has been widely investigated for aligning knowledge graph (KG) entities. Current methods have explored and utilized the graph structure, the entity names and attributes, but ignore the ontology (or ontological schema) which contains critical meta information such as classes and their membership relationships with entities. In this paper, we propose an ontology-guided entity alignment method named OntoEA, where both KGs and their ontologies are jointly embedded, and the class hierarchy and the class disjointness are utilized to avoid false mappings. Extensive experiments on seven public and industrial benchmarks have demonstrated the state-of-the-art performance of OntoEA and the effectiveness of the ontologies.


Introduction
Knowledge graphs (KGs) that are composed of entities and facts in the RDF form (i.e., RDF triples) are of vital importance in various applications, such as search engines and personal assistants (Hogan et al., 2020). Although there have been several large-scale KGs, the content of one individual KG is often incomplete, especially in supporting some domain-specific applications such as clinical AI assistants. As these KGs are developed separately, they are usually heterogeneous and supplementary to each other. Thus it becomes urgently needed to align multiple KGs (i.e., matching the equivalent elements) to fully explore their usability.
Recently, a few embedding-based methods have been proposed for entity alignment (EA) (Sun et al., 2020b;Zhang et al., 2020), in which entities are embedded into vectors and the equivalent entities are * Equal contribution † Corresponding author 1 The code and benchmarks have been submitted as the supplementary material and will be made publicly accessible.  determined via calculating the similarity of their vectors. They extend the embedding of one KG to the embedding of multiple KGs into one vector space by, for instance, a transformation matrix, which is learnt from annotated mappings (a.k.a. seed mappings). They also utilize the entity names and attributes besides the graph structures for better performance. These methods, however, all ignore the ontology (or ontological schema) which is an important part of many KGs such as DBpedia (Auer et al., 2008), NELL (Carlson et al., 2010) and Wikidata (Vrandečić and Krötzsch, 2014) as meta information for higher quality and usability. The ontology usually contains hierarchical classes and properties, and optionally defines some logical constraints such as the class disjointness, the property domain and range (Horrocks, 2008). Meanwhile, the KG usually clarifies the membership relationship between an entity and some classes.
The entity mappings by the above methods that do not exploit the ontologies may induce some class conflicts. Considering the example in Fig. 1, Victoria in KG 1 is often incorrectly aligned to VIC-TORIA in KG 2 by many embedding-based meth-ods, but they belong to two potentially disjointed classes, namely Person and Organization. We find such class conflicted mappings are quite common in the wrongly predicted mappings. Considering the EN-FR-15K-V1 benchmark by (Sun et al., 2020b), 42.2% and 55.7% of the wrongly predicted mappings are class conflicted when running BootEA  and RSN4EA , respectively. These false positive mappings could be avoided if their ontologies are considered.
In this study, we propose an ontology-guided entity alignment method named OntoEA, which enriches the embedded semantics and avoids wrong mappings by exploring the class conflicts. OntoEA can work well in two contexts: (i) one ontology is shared across the to-be-aligned KGs; (ii) the ontologies of the to-be-aligned KGs are separated. The first context seems to follow a strict assumption but is actually quite common. For example, DBpedia has multilingual KGs (or versions) that share the same ontology, and in industrial scenarios, it is preferred to first create a common ontology and then extract different KGs from different sources such as text and tables. The second context can be transformed into the first context by pre-aligning the ontologies, using existing ontology alignment systems such as PARIS (Suchanek et al., 2011) and LogMap (Jiménez-Ruiz et al., 2012), and/or cost-sensitive human intervention.
There are two challenges to utilize the ontology. First, it is difficult to embed two KGs together with the ontology, and as far as we know, there are currently no such solutions. Second, the class conflicts, indicated by the class disjointness in the ontology, are not all explicitly defined. Most of them should be learned from the KGs and they actually vary from KG to KG. For example, two classes Human and Animal are often disjointed in artwork KGs, but Human may belong to Animal in biological KGs. To address these challenges, we develop a joint embedding method that includes five modules for embedding the KGs, the ontology, the class conflicts, the membership relationships and the seed mappings, respectively. Specifically, we develop a class conflict matrix (CCM) to represent different kinds of class conflicts, including those explicitly defined as disjoint, those indicated by the class hierarchy (e.g., sibling classes are likely to be disjointed) and those plausible according to the KGs.
To the best of our knowledge, OntoEA is among the first to utilize the ontology and the embedding for KG alignment. Extensive experiments on seven diverse benchmarks have verified the effectiveness of the ontology guidance with both one shared ontology or two separated ontologies. OntoEA consistently outperforms the state-of-the-art baselines on all the benchmarks. For example, it on average achieves over 35% higher Hits@1, Hits@5 and MRR than the best baseline on MED-BBK-9Ka new and challenging industrial benchmark. Last but not least, we extend the current benchmarks with ontologies and membership relationships.

Problem Statement
A KG is denoted as G = (E, R, T ), where E, R, T are the sets of entities, relations and triples, respectively. Each triple (h, r, t) ∈ T , includes a head entity h ∈ E, a relation r ∈ R, and a tail entity t ∈ E. Their embeddings are denoted as h, r, t, respectively. For two to-be-aligned KGs indicates that e i and e j refer to the same real-world object. The entity alignment (EA) task aims to find all the mappings M between E i and E j , where we assume a small set of known entity mappings (or seed mappings) M s are given.
Each KG is assumed to be associated with an ontology, which contains hierarchical classes and optionally disjoint constraints between classes. For simplicity, we consider the classes as entities in KG and the subsumption relationships between classes, often known as rdfs:subClassOf, as relations in KG. The simplified ontology is therefore regarded as a graph. For KGs G i and G j , their associated ontologies are represented as O i = (C i , H i ) and O j = (C j , H j ), respectively. We simply denote both O i and O j as O = (C, H) if G i and G j share one ontology (i.e., O i ≡ O j ); otherwise we denote the merged ontology after ontology alignment as O = (C, H). Note C i , C j and C are class sets, while H i , H j and H are the triple sets with only the subClassOf relation. Furthermore, the membership relationships, which link the entities and the corresponding classes, are denoted as B i and B j , and e.g., B i links G i and C via b i = (e i , c) where e i ∈ E i and c ∈ C. The class and membership embeddings are denoted as c and b, respectively.

Framework
As shown in Fig. 2, OntoEA includes five modules: (i) entity embedding which embeds each KG into a separate embedding space; (ii) ontology embedding which embeds the class hierarchical structure with a non-linear transformation; (iii) confliction loss which incorporates all potential class conflicts in the embeddings with the CCM; (iv) membership loss which incorporates the membership relationships and enables the joint learning of the entity embedding and the ontology embedding; (v) alignment loss which bridges the embedding spaces of the to-be-aligned KGs via seed mappings. Overall, OntoEA jointly embeds two KGs and injects the class disjointness and membership mappings into the embeddings for mapping prediction.

Entity Embedding
To embed the KGs G i and G j , we adopt the translation-based method TransE (Bordes et al., 2013) which interprets a triple (h, r, t) as a translation by the relation r from h to t. The marginbased loss of (h, r, t) is defined as f e (h, t) = ||h+r−t|| 2 , where ||·|| 2 denotes the L2 norm. Besides, we extend the loss with a limit-based scoring loss (Zhou et al., 2017) to ensure the discrimination between the positive and negative triples and also lower scores for positive triples: where [·] + denotes the function f (x) = max(0, x), hyperparameters γ 1 e and γ 2 e control the margins, α e balances the margin-based loss and the limit-based loss, and T is the set of the negative triples with each triple sampled using the -truncated uniform negative sampling strategy  to distinguish two similar triples.
It is worth noting that this study focuses on the EA task and the joint embedding challenge, and TransE is chosen due to its simplicity and efficiency. Our OntoEA framework is open to other advanced KG embedding methods, such as Dist-Mult (Yang et al., 2015), GCN (Wang et al., 2018) and RSN .

Ontology Embedding
The shared or merged ontology O = (C, H) is composed of triples with the subClassOf relation. For simplicity, a triple (c h , r, c t ) is written as a class pair (c h , c t ) in H where r := subClassOf . Inspired by Hao et al. (2019) on embedding the hierarchical graph structure, we calculate the scoring loss of (c h , c t ) with a non-linear transformation, are the learnable parameters, and d o denotes the ontology embedding dimension. This tends to encode each class as a sphere and each subclass as a vector in the same semantic space after the non-linear transformation, and the relative positions are employed to model the relations between class and its subclass.
Similarly, we adopt the margin-based and limitbased loss for training: where H denotes the set of negative class pairs with each pair sampled by replacing c h or c t following the uniform negative sampling strategy (Bordes et al., 2013), while γ 1 o , γ 2 o and α o are hyperparameters similar to γ 1 e , γ 2 e and α e in Eq. 1. Note that we do not directly utilize TransE in the ontology embedding, because the subClassOf relation is transitive (e.g., we can infer (Royalty, subClassOf, Agent) via (Royalty, subClassOf, Person) and (Person, subClassOf, Agent) as shown in Fig. 1), which can lead to one-to-many and manyto-one mappings (or triples) in the ontology, and TransE cannot well address the relation with such transitive property .

Confliction Loss
We use a class conflict matrix (CCM) to represent the inter-class conflicts that are either explicitly defined by class disjointness or implicitly discovered from the entities. Within the CCM, the entry on the i th row and j th column, denoted as m i,j , represents the conflict degree between class c i and class c j . For one ontology, the CCM is a squared and symmetric matrix, and we only maintain the upper triangular for higher efficiency. Given two classes c i and c j , m i,j ∈ [0, 1] is calculated as follows. First, we set m i,j = 0 if c i ≡ c j , which ensures each class does not conflict with itself. Second, c i and c j are regarded as fully conflicted, i.e., m i,j = 1, if they are declared as disjointed by the ontology. 2 Third, c i and c j are regarded as not conflicted, i.e., m i,j = 0, if they have at least one common member (entity) or some of their members are given as seed mapping (i.e., there exists (e i , e j ) in M s such that e i belongs to c i and e j belongs to c j ). Note the above three conditions are matched sequentially, and the computation of m i,j is finished if any condition is met. Finally, if none of the three conditions is met, we follow the principle that the farther two classes are separated in the tree-like class hierarchy structure, the lower semantic similarity and higher conflict degree the two classes have (Mumtaz and Giese, 2020). To calculate the distance between c i and c j , we use the set of classes passed by routing from c i and c j to the root class, denoted as S(c i ) and S(c j ), respectively, and then adopt the ratio of the intersection of S(c i ) and S(c j ). Accordingly, For the implicitly discovered conflicts, we assume the small conflict degree between two classes if their embeddings are similar (i.e., high cosine similarity). Thus we calculate another cosine similarity-based class conflict degree: where c i and c j represent the embeddings of c i and c j . We propose to minimize the following negative log-likelihood loss to incorporate the class conflicts represented by CCM into the class embeddings, (3)

Membership Loss
We develop the membership embedding module to utilize the membership relationships, B i and B j , to associate the KG embedding spaces with the ontology embedding space, which is regarded as enhancing the KG embeddings with the ontology semantics. Given one membership relationship b = (e, c), we utilize a non-linear transformation to map the entity embeddings to the ontology embedding space, and we calculate the scoring loss as f m (e, c) = || tanh(W m e + b m ) − c|| 2 where W m ∈ R de×do and b m ∈ R do are learnable parameters, d e and d o denote the dimension of KG embedding and ontology embedding, respectively. Similarly, we minimize the following margin-based and limit-based loss to model all the membership relationships: where B denotes the set of negative membership relationships with each relationship created by replacing the class c following the uniform negative sampling strategy (Bordes et al., 2013), and γ 1 m , γ 2 m and α m are hyperparameters similar to γ 1 e , γ 2 e and α e in Eq. 1.

Alignment Loss
For the alignment embedding module, OntoEA utilizes the seed mappings M s to bridge the embedding spaces of G i and G j such that the equivalent mappings between two cross-KG entities can be calculated via some distance metrics, such as cosine similarity. Given a seed mapping m = (e i , e j ), its score is calculated as f a (e i , e j ) = ||W a e i − e j || 2 where W a ∈ R de×de is a learnable translation matrix, and the training loss is defined as,

Iterative Co-Training and Prediction
To incorporate the aforementioned modules and obtain the embeddings of G i , G j and O, we can directly minimize the following loss: where λ 1 , λ 2 , and λ 3 are hyperparameters that balance the losses of confliction embedding, membership embedding and alignment embedding, respectively. Instead of directly optimizing L, we use an iterative co-training strategy in OntoEA to reduce model complexity and accelerate model convergence. At each iteration, OntoEA first optimizes L E and L O independently, then sequentially optimizes L C and L M , and finally optimizes L A . The iteration stops until some stopping criterion on the validation set is met.
With the embeddings of G i , G j and O, we calculate the entity mappings with the cosine similarity. Given two entities e i ∈ G i and e j ∈ G j , the weighted similarity score is calculated as: sim(ei, ej) = β cos(ei, ej) + (1 − β) cos(ci, cj), (7) where hyperparameter β ∈ [0, 1] balances the similarities of entity embeddings and class embeddings. It is possible that one entity has multiple classes declared; for example, Victoria in Fig. 1 can be an entity of Royalty and another class like Female Leader. In this case we calculate the average of the embeddings of all the declared classes as the class embedding (c i or c j ). 3 In the prediction, for each entity in G i to be aligned, we rank all candidate entities in G j by their weighted similarity scores.
The benchmarks themselves do not include ontologies, and we thus extract and append an ontology for each KG, which includes the class structure (i.e., rdfs:subClassOf relationships) and membership relationships. Each benchmark therefore contains two KGs, their ontologies, and associated membership relationships. The benchmark statistics are shown in Table 1 where "-15K" and "9K" in the benchmark names are omitted. For the KGs of DBpedia, we use the DBpedia ontology and the membership relationships from the DBpedia SPARQL endpoint 4 by querying the classes of each entity with rdfs:type. Note we also utilize the defined class disjointness constraints by owl:disjointWith for initializing CCM. For the two Wikidata KGs in D-W-V1 and D-W-V2, we extract their ontologies and membership relationships from 15,000 418 a "share-O" means the to-be-aligned KGs share one ontology while "not-share-O" means the to-be-aligned KGs have different ontologies. b "#Cls." denotes the number of classes and "#Trs." denotes the number of rdfs:subClassOf relation triples. c "#Roots" denotes the number of entities that have no rdf:type property and are linked to the root class. d See Table 6 in the Appendix for more benchmark statistics. the Wikidata SPARQL endpoint 5 using queries with rdfs:subClassOf and rdfs:type. For the KGs of the industrial benchmark, we use domain knowledge (with the help of medical experts) to construct a shared, small-scale but high-quality ontology, and the corresponding membership relationships.

Ontology Alignment
For the benchmarks whose KGs do not share one ontology, we first align the ontologies. We adopt two ontology alignment methods: manual annotation and alignment system. In the first method, we employ five annotators to annotate class mappings for each ontology pair and the classes annotated by more than three annotators are adopted. It is worth noting that the manual annotation of class mappings is worthwhile and often adopted in KG construction and curation because of the high quality and relatively small scale in comparison with the entity mappings. In the second method, we apply a state-of-the-art ontology alignment system named PARIS 6 and some ad-hoc pre-processing and post-processing for automatic class mapping computation. Please see Appendix C for more details on ontology alignment and merging. For the D-W benchmarks, we considered both methods and compared their performance (see Table 4).

Experimental Setup
We compare OntoEA against state-of-the-art entity alignment models including four translation-based  , two graph neural network based models, GCNAlign (Wang et al., 2018); AliNet (Sun et al., 2020a), and one recurrent neural network based model, RSN4EA (Guo et al., 2018). We also compare OntoEA against some state-of-theart models that additionally utilize entity surface information (SI) (i.e., entity names) including At-trE (Trisedya et al., 2019), MultiKE  and RDGCN (Wu et al., 2019). Since using and not using SI are usually regarded as two different evaluation contexts in the literature , we separately evaluate OntoEA without (w/o) and with (w/) SI. Note that OntoEA with SI is implemented by a simple but effective strategy which initializes the embeddings of the translationbased models in Section 2.3 and Section 2.4 (i.e., h, r, t and c) with the pre-trained word embeddings of their names. As in Sun et al. (2020b), we use the unified multi-lingual word embeddings fast-Text (Bojanowski et al., 2017) for the cross-lingual benchmarks. 7 All the results of AliNet, and the results of MTransE, JAEP, SEA and GCNAlign on MED-BBK-9K are reproduced locally; while the other baseline results are taken from Sun et al. (2020b) and Zhang et al. (2020). We follow the same train (20%), validation (10%) and test (70%) splits as Sun et al. (2020b) and Zhang et al. (2020). We implement OntoEA upon the open source library OpenEA 8 . We use the AdaGrad optimizer with a learning rate of 0.01. The batch sizes for 7 The word embeddings are publicly available at https: //fasttext.cc/docs/en/crawl-vectors.html. 8 https://github.com/nju-websoft/ OpenEA entity embedding and ontology embedding are set to 4500 and 64, respectively, and their dimensions are both set to 300. The weight hyperparameters are set as λ 1 = 1, λ 2 = 1, λ 3 = 5 and β = 0.5, and other hyperparameters are set as γ 1 x = 0.01, γ 2 x = 2.0 and α x = 0.2 for all the losses where x ∈ {e, o, m}. These hyperparamters are tuned w.r.t. the MRR on the validation set. Please see Appendix B for more implementation details.
With the embeddings we use the nearest neighbor search with cross-domain similarity local scaling (Lample et al., 2018) to calculate the entity matching of each to-be-aligned entity. In the evaluation, we rank matching candidates of each to-bealigned entity and calculate the metrics of Hits@1 (H@1), Hits@5 (H@5) and mean reciprocal rank (MRR). In the following tables, the best (second best resp.) result is bolded (underlined resp.). Table 2 reports the overall results of OntoEA and the baselines. 9 Overall, in both contexts of using and not using SI, OntoEA performs the best across all the benchmarks on all the metrics, except for H@1 on EN-DE-15K-V2 and D-W-15K-V2. Without SI. Regarding the benchmarks that share one ontology, OntoEA achieves at least 10% higher performance in all the metrics over the best baseline BootEA on EN-FR-15K-V1 which has sparse KG structures. On EN-FR-15K-V2 with dense KG structures, OntoEA achieves higher H@5 and MRR but competitive H@1 in comparison with the best baseline. For the D-W bench- marks whose KGs have different ontologies, we have a similar finding: the outperformance of On-toEA is more significant on the benchmark with sparse KG structures (i.e., D-W-15K-V1) than on the benchmark with dense KG structures (i.e., D-W-15K-V2). On the one hand, dense KG structures indicate that more information can be utilized with better performance, and relatively the additional positive impact by the ontology becomes more limited. On the other hand, the ontologies of both EN-FR-15K-V2 and D-W-15K-V2 coincidentally have fewer classes (see Table 1), which would also lead to a lower impact of the ontology guidance. Regarding the industrial MED-BBK-9K, which is quite new and has shown more challenging w.r.t. these baselines (Zhang et al., 2020), OntoEA outperforms the best baseline by more than 10% in all the metrics. Although its ontology has a limited scale, the ontology is of high quality as it is specifically created with the domain knowledge.

Overall Results
With SI. OntoEA shows very promising overall results as OntoEA without SI. It outperforms all the baselines on all the benchmarks, with the improvements over the best baseline ranging from 2% to 68.9%. Regarding the industrial MED-BBK-9K, the performance improvement of OntoEA is especially significant with more than 60% in all the metrics. As aforementioned, this benchmark is challenging to these baselines and it has highquality ontologies. It is worth mentioning that on the D-W benchmarks, the involvement of SI deteriorates the performance of OntoEA and the best baseline, while the positive impact of the ontologies (i.e., the performance gain of OntoEA over the best baseline) becomes more significant.

Ablation Studies
Model Component Analysis. To investigate different components in OntoEA, we compare On-toEA variants without the CCM loss (w/o L C ), without the membership relationship loss (w/o L M ), and without the ontology (w/o Onto.). The results on EN-FR-15K-V1 and EN-FR-15K-V2 are reported in Table 3. They show that L M contributes more to OntoEA than L C as removing it leads to a larger performance drop. As expected, the removal of the ontology deteriorates performance the most. All these results verify the significant role of the ontologies and their associated class conflicts and membership relationships. Besides, the findings of ablation studies for w/ SI are the same as for w/o SI. (Due to the limitation of the length of the paper, the expression is omitted.) Results on Different Test Entity Mappings. We split the test mappings according to the summed degree, i.e., deg(e i , e j ) := deg(e i ) + deg(e j ) where (e i , e j ) are the entity mappings. The results of On-toEA and some competitive baselines on different degree intervals (splits) are shown in Fig. 3. On EN-FR-15K-V1, OntoEA outperforms the baselines on all the intervals and the performance gap reaches the highest on [0,10). On EN-FR-15K-V2 with dense KG structures, OntoEA performs close to or slightly worse than BootEA on intervals [10,20) and [20,30) but outperforms all the baselines on the other intervals. On MED-BBK-9K, OntoEA performs much better than the baselines on all the intervals. All these observations are consistent with our findings from the overall results in Table 2, and they also verify the effectiveness of OntoEA on different test entity mappings. Analysis of Class Conflicts. This part analyses the predicted mappings that have class conflicts, with the results shown in Fig. 4. Note that the class conflict ratio of a method is the rate of its false positive mappings with class conflicts among all its false positive mappings. On EN-FR-15K-V1 and EN-FR-15K-V2, OntoEA significantly decreases the class conflict ratio to 3% and 0.3%, respectively, much lower than those of the baselines. On MED-BBK-9K, OntoEA reduces the class conflict ratio to 34.0% while MultiKE and RDGCN have 51.5% and 60.1%, respectively. The above observations illustrate that OntoEA has effectively implemented our motivation of using the ontologies and class conflicts to avoid false positive mappings. Analysis of Ontology Alignment Methods. As the case that the to-be-aligned KGs have different ontologies is common, we analyse the impact of different ontology alignment methods proposed in Sec. 3.2, with the results in Table 4. We find that OntoEA with manual annotation achieves better results than OntoEA using the PARIS based alignment system method; the latter, however, still has competitive or better results in comparison with the best baseline (i.e., BootEA or RDGCN). This also provides motivations for the future research and industrial deployment work on iterative ontology alignment to keep a balance between the annotation cost and the alignment quality.

Related Work
Embedding-based KG Alignment. These methods can be categorised into translation-based, GNNbased and RNN-based. The translation-based methods mainly rely upon some translation-based KG  . Some work has explored a semi-supervised learning setting, and for instance, BootEA  adopts the bootstrapping strategy to iteratively append new likely mappings during the learning process.
KG Embedding with Ontology. Only a few attempts have been made towards the KG embedding with ontology. The most relevant work is JOIE (Hao et al., 2019) which embeds the KG and the ontology in two embedding spaces and enables them to enhance each other by the cross-view links i.e., membership relationships. Some ontology embedding methods such as OWL2Vec* (Chen et al., 2020) can be extended to jointly embed KG plus ontology since the KG can be regarded as an assertion part (ABox) of the ontology. However, as far as we know, there are currently no embedding methods that support two KGs along with their ontologies, let alone utilize the ontologies to augment embedding-based KG alignment.
This paper presented a novel method OntoEA to augment embedding-based entity alignment with the ontology. OntoEA enriches the semantics of KG embeddings by jointly learning it from the KGs, the ontology, the membership relationships and the seed mappings, and utilizes the potential class conflicts to avoid false mappings. The evaluation on multiple benchmarks has showed that On-toEA could achieve state-of-the-art performance and reduce the false positive mappings with class conflicts. For the future work, we plan to explore and utilize other ontology semantics besides the class hierarchy and class disjointness constraints. Table 6 shows the full statistics of all the benchmarks used in the experiments, including four cross-lingual benchmarks and two cross-KG benchmarks from OpenEA (Sun et al., 2020b), and one recent industrial cross-KG benchmark from Zhang et al. (2020). We report the benchmark statistics from three perspectives: the ontologies, the KGs and the membership relationships. The benchmarks are divided as previous work that 20%, 10% and 70% of all the entity mappings are used as training, validation and test sets, respectively (Sun et al., 2020b;Zhang et al., 2020).

C Ontology Alignment Methods
In this section, we present the details of the two ontology alignment methods: manual annotation and alignment system.

C.1 Manual Annotation
We refer to crowdsourcing for manual annotation of the class mappings by the following steps which are presented with the example of the DBpedia ontology and the Wikidata ontology in the D-W benchmarks. First, we recruit five volunteers with at least undergraduate education as the annotators and provide them the two to-be-aligned ontologies. The DBpedia ontology (O D ) contains a total of 755 classes while the Wikidata ontology (O W ) contains 695 classes. Second, for each class in O D , denoted as c D , each annotator is required to consider as more classes in O W as possible and find the equivalent class mapping (c D , c W ), where c W denotes a class in O W . Note that the annotators are allowed to utilize her/his knowledge and accelerate the annotation by searching for class c W ∈ O W , using some friendly ontology management and accessing software such as Protégé. After all five annotators finish the work, only the class mappings that are labelled by more than three annotators are accepted as the final class mappings, denoted as M C .
With the class mappings, we need to further merge the two ontologies as one shared ontology which can then be processed by OntoEA. To this end, our solution is to build new membership relationships between the entities of the Wikidata KG and the classes of the DBpedia ontology. For each original membership relationship within the Wikidata KG and ontology, we replace its class c W with c D if (c D , c W ) ∈ M C , or with the root class owl:Thing otherwise.

C.2 Alignment System
Algorithm 1 Alignment System Method