ActiveEA: Active Learning for Neural Entity Alignment

Entity Alignment (EA) aims to match equivalent entities across different Knowledge Graphs (KGs) and is an essential step of KG fusion. Current mainstream methods – neural EA models – rely on training with seed alignment, i.e., a set of pre-aligned entity pairs which are very costly to annotate. In this paper, we devise a novel Active Learning (AL) framework for neural EA, aiming to create highly informative seed alignment to obtain more effective EA models with less annotation cost. Our framework tackles two main challenges encountered when applying AL to EA: (1) How to exploit dependencies between entities within the AL strategy. Most AL strategies assume that the data instances to sample are independent and identically distributed. However, entities in KGs are related. To address this challenge, we propose a structure-aware uncertainty sampling strategy that can measure the uncertainty of each entity as well as its impact on its neighbour entities in the KG. (2) How to recognise entities that appear in one KG but not in the other KG (i.e., bachelors). Identifying bachelors would likely save annotation budget. To address this challenge, we devise a bachelor recognizer paying attention to alleviate the effect of sampling bias. Empirical results show that our proposed AL strategy can significantly improve sampling quality with good generality across different datasets, EA models and amount of bachelors.

(1) How to exploit dependencies between entities within the AL strategy. Most AL strategies assume that the data instances to sample are independent and identically distributed. However, entities in KGs are related. To address this challenge, we propose a structure-aware uncertainty sampling strategy that can measure the uncertainty of each entity as well as its impact on its neighbour entities in the KG.
(2) How to recognise entities that appear in one KG but not in the other KG (i.e., bachelors). Identifying bachelors would likely save annotation budget. To address this challenge, we devise a bachelor recognizer paying attention to alleviate the effect of sampling bias.
Empirical results show that our proposed AL strategy can significantly improve sampling quality with good generality across different datasets, EA models and amount of bachelors.

Introduction
Knowledge Graphs (KGs) store entities and their relationships with a graph structure and are used as knowledge drivers in many applications (Ji et al., 2020). Existing KGs are often incomplete but complementary to each other. A popular approach used to tackle this problem is KG fusion, which attempts to combine several KGs into a single, comprehensive one. Entity Alignment (EA) is an essential Neural models (Chen et al., 2017(Chen et al., , 2018Wang et al., 2018;Cao et al., 2019) are the current stateof-the-art in EA and are capable of matching entities in an end-to-end manner. Typically, these neural EA models rely on a seed alignment as training data which is very labour-intensive to annotate. However, previous EA research has assumed the availability of such seed alignment and ignored the cost involved with their annotation. In this paper, we seek to reduce the cost of annotating seed alignment data, by investigating methods capable of selecting the most informative entities for labelling so as to obtain the best EA model with the least annotation cost: we do so using Active Learning. Active Learning (AL) (Aggarwal et al., 2014) is a Machine Learning (ML) paradigm where the annotation of data and the training of a model are performed iteratively so that the sampled data is highly informative for training the model. Though many general AL strategies have been proposed (Settles, 2012;Ren et al., 2020), there are some unique challenges in applying AL to EA.
The first challenge is how to exploit the dependencies between entities. In the EA task, neighbouring entities (context) in the KGs naturally affect each other. For example, in the two KGs of Fig. 1, we can infer US corresponds to America if we already know that Donald Trump and D.J. Trump refer to the same person: this is because a single person can only be the president of one country. Therefore, when we estimate the value of annotating an entity, we should consider its impact on its context in the KG. Most AL strategies assume data instances are independent, identically distributed and cannot capture dependencies between entities (Aggarwal et al., 2014). In addition, neural EA models exploit the structure of KGs in different and implicit ways (Sun et al., 2020b). It is not easy to find a general way of measuring the effect of entities on others.
The second challenge is how to recognize the entities in a KG that do not have a counterpart in the other KG (i.e., bachelors). In the first KG of Fig. 1, Donald Trump and US are matchable entities while New York City and Republican Party are bachelors. Selecting bachelors to annotate will not lead to any aligned entity pair. The impacts of recognizing bachelors are twofold: 1. From the perspective of data annotation, recognizing bachelors would automatically save annotation budget (because annotators will try to seek a corresponding entity for some time before giving up) and allow annotators to put their effort in labelling matchable entities. This is particularly important for the existing neural EA models, which only consider matchable entities for training: thus selecting bachelors in these cases is a waste of annotation budget. 2. From the perspective of EA, bachelor recognition remedies the limitation of existing EA models that assume all entities to align are matchable, and would enable them to be better used in practice (i.e., real-life KGs where bachelors are popular).
To address these challenges, we propose a novel AL framework for EA. Our framework follows the typical AL process: entities are sampled iteratively, and in each iteration a batch of entities with the highest acquisition scores are selected. Our novel acquisition function consists of two components: a structure-aware uncertainty measurement module and a bachelor recognizer. The structure-aware uncertainty can reflect the uncertainty of a single entity as well as the influence of that entity in the context of the KG, i.e., how many uncertainties it can help its neighbours eliminate. In addition, we design a bachelor recognizer, based on Graph Convolutional Networks (GCNs). Because the bachelor recognizer is trained with the sampled data and used to predict the remaining data, it may suffer from bias (w.r.t. the preference of sampling strategy) of these two groups of data. We apply model ensembling to alleviate this problem.
Our major contributions in this paper are: 1. A novel AL framework for neural EA, which can produce more informative data for training EA models while reducing the labour cost involved in annotation. To our knowledge, this is the first AL framework for neural EA. 2. A structure-aware uncertainty sampling strategy, which models uncertainty sampling and the relation between entities in a single AL strategy. 3. An investigation of bachelor recognition, which can reduce the cost of data annotation and remedy the defect of existing EA models. 4. Extensive experimental results that show our proposed AL strategy can significantly improve the quality of data sampling and has good generality across different datasets, EA models, and bachelor quantities.

Entity Alignment
Entity alignment is typically performed between two KGs G 1 and G 2 , whose entity sets are denoted as E 1 and E 2 respectively. The goal of EA is to find the equivalent entity pairs A = {(e 1 , e 2 ) ∈ E 1 ×E 2 |e 1 ∼ e 2 }, where ∼ denotes an equivalence relationship and is usually assumed to be a one-toone mapping. In supervised and semi-supervised models, a subset of the alignment A seed ⊂ A, called seed alignment, are annotated manually beforehand and used as training data. The remaining alignment form the test set A test = A \ A seed . The core of an EA model F is a scoring function F (e 1 , e 2 ), which takes two entities as input and returns a score for how likely they match. The effectiveness of an EA model is essentially determined by A seed and we thus denote it as m(A seed ).

Active Learning
An AL framework consists of two components: (1) an oracle (annotation expert), which provides labels for the queries (data instances to label), and  (2) a query system, which selects the most informative data instances as queries. In pool-based scenario, there is a pool of unlabelled data U. Given a budget B, some instances U π,B are selected from the pool following a strategy π and sent to the experts to annotate, who produce a training set L π,B . We train the model on L π,B and the effectiveness m(L π,B ) of the obtained model reflects how good the strategy π is. The goal is to design an optimal strategy π * such that π * = argmax π m(L π,B ).

Problem Definition
Given two KGs G 1 , G 2 with entity sets E 1 , E 2 , an EA model F , a budget B, the AL strategy π is applied to select a set of entities U π,B so that the annotators label the counterpart entities to obtain the labelled data L π,B . L π,B consists of annotations of matchable entities L + π,B , which form the seed alignment A seed π,B , and bachelors L − π,B . We measure the effectiveness m(A seed π,B ) of the AL strategy π by training the EA model on A seed π,B and then evaluating it with A test π,B = A \ A seed π,B . Our goal is to design an optimal entity sampling strategy π * so that π * = argmax π m(A seed π,B ). In our annotation setting, we select entities from one KG and then let the annotators identify their counterparts from the other KG. Under this setting, we assume the pool of unlabelled entities is initialized with U = E 1 . The labelled data will be like L + π,B = {(e 1 ∈ E 1 , e 2 ∈ E 2 )} and L − π,B = {(e 1 ∈ E 1 , null)}.

Framework Overview
The whole annotation process, as shown in Fig. 2, is carried out iteratively. In each iteration, the query system selects N entities from U and sends them to the annotators. The query system includes (1) a structure-aware uncertainty measurement module f su , which combines uncertainty sampling with the structure information of the KGs, and (2) a bachelor recognizer f b , which helps avoid selecting bachelor entities. The final acquisition f π used to select which entities to annotate is obtained by combining the outputs of these two modules. After the annotators assign the ground-truth counterparts to the selected entities, the new annotations are added to the labelled data L. With the updated L, the query system updates the EA model and the bachelor recognizer. This process repeats until no budget remains. To simplify the presentation, we omit the sampling iteration when explaining the details.

Structure-aware Uncertainty Sampling
We define the influence of an entity on its context as the amount of uncertainties it can help its neighbours remove. As such, we formulate the structure-aware uncertainty f su as where N out i is the outbound neighbours of entity e 1 i (i.e. the entities referred to by e 1 i ) and w ij measures the extent to which e 1 i can help e 1 j eliminate uncertainty. The parameter α controls the tradeoff between the impact of entity e 1 i on its context (first term in the equation) and the normalized uncertainty (second item). Function f u (e 1 ) refers to the margin-based uncertainty of an entity. For each entity e 1 , the EA model can return the matching scores F (e 1 , e 2 ) with all unaligned entities e 2 in G 2 . Since these scores in existing works are not probabilities, we exploit the margin-based uncertainty measure for convenience, outlined in Eq. 2: where F (e 1 , e 2 * ) and F (e 1 , e 2 * * ) are the highest and second highest matching scores respectively. A large margin represents a small uncertainty.
For each entity e 1 j , we assume its inbound neighbours can help it clear all uncertainty. Then, we have e 1 In this work, we assume all inbound neighbours have the same impact on e 1 j . In this case, w ij = 1 degree(e 1 j ) , where degree(·) returns the in-degree of an entity.
Using matrix notion, Eq. 1 can be rewritten as where f su is the vector of structure-aware uncertainties, f u is the vector of uncertainties, and W is a matrix encoding influence between entities, i.e., w ij > 0 if e 1 i is linked to e 1 j , otherwise 0. As W is a stochastic matrix (Gagniuc, 2017), we solve Eq. 1 iteratively, which can be viewed as the power iteration method (Franceschet, 2011), similar to Pagerank (Brin and Page, 1998). Specifically, we initialize the structure-aware uncertainty vector as f su 0 = f u . Then we update f su t iteratively: The computation ends when |f su t − f su t−1 | < .

Bachelor Recognizer
The bachelor recognizer is formulated as a binary classifier, which is trained with the labelled data and used to predict the unlabelled data. One challenge faced here is the bias between the labelled data and the unlabelled data caused by the sampling strategy (since it is not random sampling). We alleviate this issue with a model ensemble.

Model Structure
We apply two GCNs (Kipf and Welling, 2017; Hamilton et al., 2017) as the encoders to get the entity embeddings H 1 = GCN 1 (G 1 ), H 2 = GCN 2 (G 2 ), where each row in H 1 or H 2 corresponds to a vector representation of a particular entity. The two GCN encoders share the same structure but have separate parameters. With each GCN encoder, each entity e i is first assigned a vector representation h i . Then contextual features of each entity are extracted: where l is the layer index, N i is the neighbouring entities of entity e i , and σ is the activation function, norm(·) is a normalization function, and V (l) , b (l) are the parameters in the l-th layer. The representations of each entity e i obtained in all GCN layers are concatenated into a single representation: After getting the representations of entities, we compute the similarities of each entity in E 1 with all entities in E 2 (S = H 1 · H 2 T ) and obtain its corresponding maximum matching score as in f s (e 1 i ) = max(S i,: ). The entity e 1 i whose maximum matching score is greater than a threshold γ is considered to be a matchable entity as in

Learning
In each sampling iteration, we train the bachelor recognizer with existing annotated data L containing matchable entities L + and bachelors L − . Furthermore, L is divided into a training set L t and a validation set L v .
We optimize the parameters, including {V (l) , b (l) } 1≤l≤L of each GCN encoder and the threshold γ, in two phases, sharing similar idea with supervised contrastive learning (Khosla et al., 2020). In the first phase, we optimize the scoring function f s by minimizing the constrastive loss shown in Eq. 3.
Here, β is a balance factor, and [·] + is max(0, ·), and L t,neg is the set of negative samples generated by negative sampling (Sun et al., 2018). For a given pre-aligned entity pair in L + , each entity of it is substituted for N neg times. The distance of negative samples is expected to be larger than the margin λ. In the second phase, we freeze the trained f s and optimize γ for f b . It is easy to optimize γ, e.g. by simple grid search, so that f b can achieve the highest performance on L v (denoted as q(f s , γ, L v )) using: γ * = argmax γ q(f s , γ, L v ).

Model Ensemble for Sampling Bias
The sampled data may be biased, since they have been preferred by the sampling strategy rather than selected randomly. As a result, even if the bachelor recognizer is well trained with the sampled data it may perform poorly on data yet to sample. We apply a model ensemble to alleviate this problem. Specifically, we divide the L into K subsets evenly. Then we apply K-fold cross-validation to train K scoring functions {f s 1 , ..., f s K }, each time using K − 1 subsets as the training set and the left out portion as validation set. Afterwards, we search for an effective γ threshold: At inference, we ensemble by averaging the K scoring functions f s k to form the final scoring function f s as in Eq. 4 and base f b on it.

Final Acquisition Function
We combine our structure-aware uncertainty sampling with the bachelor recognizer to form the final acquisition function: 4 Experimental Setup

Sampling Strategies
We construct several baselines for comparison: rand random sampling used by existing EA works. degree selects entities with high degrees. pagerank (Brin and Page, 1998) measures the centrality of entities by considering their degrees as well as the importance of its neighbours. betweenness (Freeman, 1977) refers to the number of shortest paths passing through an entity. uncertainty sampling selects entities that the current EA model cannot predict with confidence. Note that in this work we measure uncertainty using Eq. 2 for fair comparison. degree, pagerank and betweenness are purely topology-based and do not consider the current EA model. On the contrary, uncertainty is fully based on the current EA model without being able to capture the structure information of KG. We compare both our structure-aware uncertainty sampling (struct_uncert) and the full framework ActiveEA with the baselines listed above. We also examine the effect of Bayesian Transformation, which aims to make deep neural models represent uncertainty more accurately (Gal et al., 2017).

EA Models
We apply our ActiveEA framework to three different EA models, which are a representative spread of neural EA models and varied in KG encoding, considered information and training method Sun et al., 2018): BootEA (Sun et al., 2018) encodes the KGs with the translation model (Bordes et al., 2013), exploits the structure of KGs, and uses self-training. Alinet (Sun et al., 2020a) also exploits the structure of KGs but with a GCN-based KG encoder, and is trained in a supervised manner.
RDGCN (Wu et al., 2019) trains a GCN in a supervised manner, as Alinet, but it can incorporate entities' attributes. Our implementations and parameter settings of the models rely on OpenEA 1 (Sun et al., 2020b).
Existing work on EA assumes all entities in the KGs are matchable, thus only sampling entities with counterparts when producing the datasets. For investigating the influence of bachelors on AL strategies, we synthetically modify the datasets by excluding a portion of entities from the second KG.

Evaluation Metrics
We use Hit@1 as the primary evaluation measure of the EA models. To get an overall evaluation of one AL strategy across different sized budgets, we plot the curve of a EA model's effectiveness with respect to the proportion of annotated entities, and calculate the Area Under the Curve (AUC).

Parameter Settings
We set α = 0.1, = 1e −6 for the structure-aware uncertainty. We use L = 1 GCN layer for our bachelor recognizer with 500 input and 400 output dimensions. We set K = 5 for its model ensemble and λ = 1.5, β = 0.1, N neg = 10 for its training. The sampling batch size is set to N = 100 for 15K data and N = 1000 for 100K data.

Reproducibility Details
Our experiments are run on a GPU cluster. We allocate 50G memory and one 32GB nVidia Tesla V100 GPU for each job on 15K data, and 100G memory for each job on 100K data. The training and evaluation of ActiveEA take approximately 3h with Alinet on 15K data, 10h with BootEA on 15K data, 10h with RDGCN on 15K data, and 48h with Alinet on 100K data. Most baseline strategies take less time than ActiveEA on the same dataset except betweenness on 100K data, which takes more than 48h. We apply grid search for setting α and N (shown in Sec. 5.4). Hyper-parameters of the bachelor recognizer are chosen by referring the settings of OpenEA and our manual trials. Code and datasets are available at https://github.com/ UQ-Neusoft-Health-Data-Science/ ActiveEA.

Comparison with Baselines
Fig . 3 presents the overall performance of each strategy with three EA models on two datasets, each of which we also synthetically modify to include 30% bachelors. We also report the AUC@0.5 values of these curves in Tab. 1. ActiveEA degenerates into struct_uncert when there is no bachelor. Random Sampling. Random sampling usually performs poorly when the annotation proportion is small, while it becomes more competitive when the amount of annotations increases. But for most annotation proportions, random sampling exhibits a large gap in performance compared to the best method. This observation highlights the need to investigate data selection for EA.
Topology-based Strategies. The topology-based strategies are effective when few annotations are provided, e.g., < 20%. However, once annotations increase, the effectiveness of topology-based strategies is often worse than random sampling. This may be because these strategies suffer more from the bias between the training set and test set. Therefore, only considering the structural information of KGs has considerable drawbacks for EA.
Uncertainty Sampling. On the contrary, the un-  Table 1: Overall performance (AUC@0.5 (%)) for each sampling strategy. The highest performing strategy in each column is indicated in bold. We run each strategy 5 times; most results for ActiveEA show statistically significant differences over other methods (paired t-test with Bonferroni correction, p < 0.05), except the few cells indicated by n .
certainty sampling strategy performs poorly when the proportion of annotations is small but improves after several annotations have been accumulated.
One reason for this is that neural EA models cannot learn useful patterns with a small number of annotations. On datasets with bachelors, uncertainty sampling always performs worse than random sampling. Thus, it is clear that uncertainty sampling cannot be applied directly to EA. Structure-aware Uncertainty Sampling. Structure-aware uncertainty is effective across all annotation proportions. One reason for this is that it combines the advantages of both topology-based strategies and uncertainty sampling. This is essential for AL as it is impossible to predict the amount of annotations required for new datasets.
ActiveEA. ActiveEA, which enhances structureaware sampling with a bachelor recognizer, greatly improves EA when KGs contain bachelors.

Generality
The structure-aware uncertainty sampling mostly outperforms the baselines, while ActiveEA performs even better in almost all cases. ActiveEA also demonstrates generality across datasets, EA models, and bachelor proportions.
When the dataset has no bachelors, our uncertainty-aware sampling is exceeded by uncertainty sampling in few large-budget cases. However, the real-world datasets always have bachelors. In this case, our structure-aware uncertainty shows more obvious advantages.
In addition, the strategies are less distinguishable when applied to RDGCN. The reason is that RDGCN exploits the name of entities for prealignment and thus all strategies achieve good performance from the start. To assess the generality across datasets of different sizes, we evaluate the sampling strategies with Alinet using ENFR (100K entities), which is larger than DW and ENDE (15K entities). We choose Alinet because it is more scalable than BootEA and RDGCN (Zhao et al., 2020). Fig. 4 presents comparable results to the 15K datasets.

Effect of Bachelors
To investigate the effect of bachelors, we removed different amounts of entities randomly (each larger sample contains the subset from earlier samples) from G 2 so that G 1 had different percentages of bachelors. Fig. 5 shows the results of applying all strategies to these datasets. We further make the following four observations: 1. The performance of all strategies except Act-iveEA decrease as bachelors increase. How to avoid selecting bachelors is an important issue in designing AL strategies for EA. 2. Among all strategies, uncertainty sampling is affected the most, while topology-based methods are only marginally affected. 3. Our structure-aware uncertainty outperforms the baselines in all tested bachelor proportions. 4. ActiveEA increases performance as the proportion of bachelors increases. The reason is: if G 1 is fixed and the bachelors can be recognized successfully, a certain budget can lead to larger ratio of annotated matchable entities in datasets with more bachelors than in those with less bachelors.

Effectiveness of Bachelor Recognizer
Fig . 6 shows the effectiveness of our bachelor recognizer in the sampling process and the effect of model ensemble. The green curve shows the Micro-F1 score of our bachelor recognizer using the model ensemble. Our bachelor recognizer achieves high effectiveness from the start of sampling, where there are few annotations. Each red dot represents the performance of the bachelor recognizer trained with a certain data partition without using the model ensemble. Performance varied because of the bias problem. Therefore, our model ensemble makes the trained model obtain high and stable performance.

Sensitivity of Parameters
To investigate the sensitivity of parameters, we ran our strategy with AliNet and BootEA on two DW variants with bachelor proportions of 0% and 30%. The sensitivity w.r.t. α is shown in the top row of  Fig. 7. We observe that our method is not sensitive to α. The effectiveness fluctuates when α < 0.5, and decreases when α > 0.5. This indicates uncertainty is more informative than structural information. When α = 0, our struct_uncert degenerates to uncertainty sampling (Eq. 2). In the upper left plot, we show the corresponding performance with dotted lines. Under most settings of α, the struct_uncert is much better than uncertainty sampling. This means that introducing structure information is beneficial. The bottom row of Fig. 7 shows the effect of sampling batch size N . The overall trend is that larger batch sizes decrease performance. This observation confirms the intuition that more frequent updates to the EA model lead to more precise uncertainty. Therefore, the choice of value of sampling batch size is a matter of trade-off between computation cost and sampling quality.

Examination of Bayesian Transformation
We enhanced the uncertainty sampling and Act-iveEA with Bayesian Transformation, implemented with Monte Carlo (MC) dropout, and applied them to Alinet and RDGCN on DW and ENDE as in Sec. 5.1. Fig. 8 shows improvements with different settings of MC dropout rate. We find (1) the variation of effects on uncertainty sampling is greater than that on ActiveEA; (2) Bayesian Transformation with small dropout (e.g., 0.05) results in slight improvements to ActiveEA in most cases.

Related Works
Entity Alignment. Entity Alignment refers to the matching of entities across different KGs that refer to the same real-world object. Compared with Entity Resolution (Mudgal et al., 2018), which matches duplicate entities in relational data, EA deals with graph data and emphasizes on exploiting the structure of KGs. Neural models (Chen et al., 2017(Chen et al., , 2018Wang et al., 2018;Cao et al., 2019) replaced conventional approaches (Jiménez-Ruiz and Grau, 2011;Suchanek et al., 2011) as the core methods used in recent years. Typically they rely on seed alignment as training data -this is expensive to annotate. Iterative training (i.e., self-training) has been applied to improve EA models by generating more training data automatically (Sun et al., 2018;Mao et al., 2020). These works concern better training methods with given annotated data. However, the problem of reducing the cost of annotation has been neglected. Berrendorf et al. (2021) have been the first to explore AL strategies for EA task. They compared several types of AL heuristics including node centrality, uncertainty, graph coverage, unmatchable entities, etc. and they empirically showed the impact of sampling strategies on the creation of seed alignment. In our work, we highlight the limitations of single heuristics and propose an AL framework that can consider information structure, uncertainty sampling and unmatchable entities at the same time. In addition, existing neural models assume all KGs entities have counterparts: this is a very strong assumption in reality (Zhao et al., 2020). We provide a solution to recognizing the bachelor entities, which is complementary to the existing models.

Active Learning.
Active Learning is a general framework for selecting the most informative data to annotate when training Machine Learning models (Aggarwal et al., 2014). The pool-based sampling scenario is a popular AL setting where a base pool of unlabelled instances is available to query from (Settles, 2012;Aggarwal et al., 2014). Our proposed AL framework follows this scenario. Numerous AL strategies have been proposed in the general domain (Aggarwal et al., 2014). Uncertainty sampling is the most widely used because of its ease to implement and its robust effectiveness (Lewis, 1995;Cohn et al., 1996). However, there are key challenges that general AL strategies cannot solve when applying AL to EA. Most AL strategies are designed under the assumption that the data is independent and identically distributed. However, KGs entities in the AL task are correlated, as in other graph-based tasks, e.g., node classification (Bilgic et al., 2010) and link prediction (Ostapuk et al., 2019). In addition, bachelor entities cause a very special issue in EA. They may have low informativeness but high uncertainty. We design an AL strategy to solve these special challenges. Few existing works (Qian et al., 2017;Malmi et al., 2017) have applied AL to conventional EA but do not consider neural EA models, which have now become of widespread use. Only Berrendorf et al. (2021) empirically explored general AL strategies for neural EA but did not solve the aforementioned challenges.

Conclusion
Entity Alignment is an essential step for KG fusion. Current mainstream methods for EA are neural models, which rely on seed alignment. The cost of labelling seed alignment is often high, but how to reduce this cost has been neglected. In this work, we proposed an Active Learning framework (named ActiveEA), aiming to produce the best EA model with the least annotation cost. Specifically, we attempted to solve two key challenges affecting EA that general AL strategies cannot deal with. Firstly, we proposed a structure-aware uncertainty sampling, which can combine uncertainty sampling with the structure information of KGs. Secondly, we designed a bachelor recognizer, which reduces annotation budget by avoiding the selection of bachelors. Specially, it can tolerate sampling biases. Extensive experimental showed ActiveEA is more effective than the considered baselines and has great generality across different datasets, EA models and bachelor percentages.
In future, we plan to explore combining active learning and self-training which we believe are complementary approaches. Self-training can generate extra training data automatically but suffers from incorrectly labelled data. This can be addressed by amending incorrectly labelled data using AL strategies.