Knowing the No-match: Entity Alignment with Dangling Cases

This paper studies a new problem setting of entity alignment for knowledge graphs (KGs). Since KGs possess different sets of entities, there could be entities that cannot find alignment across them, leading to the problem of dangling entities. As the first attempt to this problem, we construct a new dataset and design a multi-task learning framework for both entity alignment and dangling entity detection. The framework can opt to abstain from predicting alignment for the detected dangling entities. We propose three techniques for dangling entity detection that are based on the distribution of nearest-neighbor distances, i.e., nearest neighbor classification, marginal ranking and background ranking. After detecting and removing dangling entities, an incorporated entity alignment model in our framework can provide more robust alignment for remaining entities. Comprehensive experiments and analyses demonstrate the effectiveness of our framework. We further discover that the dangling entity detection module can, in turn, improve alignment learning and the final performance. The contributed resource is publicly available to foster further research.


Introduction
Knowledge graphs (KGs) have evolved to be the building blocks of many intelligent systems (Ji et al., 2020). Despite the importance, KGs are usually costly to construct (Paulheim, 2018) and naturally suffer from incompleteness (Galárraga et al., 2017). Hence, merging multiple KGs through entity alignment can lead to mutual enrichment of their knowledge , and provide downstream applications with more comprehensive knowledge representations (Trivedi et al., 2018;. Entity alignment seeks to discover identical entities in different KGs, such as English entity Thailand and its French counterpart Thaïlande. To tackle this important problem, literature has attempted with the embedding-based entity alignment methods (Chen et al., 2017;Wang et al., 2018;Fey et al., 2020;Wu et al., 2020a;Liu et al., 2020;Sun et al., 2020a). These methods jointly embed different KGs and put similar entities at close positions in a vector space, where the nearest neighbor search can retrieve entity alignment. Due to its effectiveness, embedding-based entity alignment has drawn extensive attention in recent years (Sun et al., 2020c).
Nonetheless, to practically support the alignment of KGs as a real-world task, existing studies suffer one common problem of identifying entities without alignment across KGs (called dangling entities). Specifically, current methods are all built upon the assumption that any source entity has a counterpart in the target KG (Sun et al., 2020c), and are accordingly developed with learning resources that enforce the same assumption. Hence, given every entity in a source KG, a model always tends to predict a counterpart via the nearest neighbor search in the embedding space. However, since each KG may be independently created based on separate corpora (Lehmann et al., 2015) or contributed by different crowds (Speer et al., 2017;Carlson et al., 2010), it is natural for KGs to possess different sets of entities (Collarana et al., 2017), as illustrated in Fig. 1. Essentially, this problem overlooked in prior studies causes existing methods to fall short of distinguishing between matchable and dangling entities, hence hinders any of such methods to align KGs in a real-world scenario.
Towards more practical solutions of entity alignment for KGs, we provide a redefinition of the task with the incorporation of dangling cases ( §2.1), as the first contribution of this work. Given a source entity, our setting does not assume that it must have a counterpart in the target KG as what previous studies do. Instead, conducting entity alignment also involves identifying whether the counterpart of an entity actually exists in another KG. Hence, a system to tackle this realistic problem setting of entity alignment is also challenged by the requirement for justifying the validity of its prediction.
To facilitate the research towards the new problem, the second contribution of this work is to construct a new dataset DBP2.0 for entity alignment with dangling cases ( §2.2). As being discussed, existing benchmarks for entity alignment, including DBP15K (Sun et al., 2017), WK3L (Chen et al., 2017) and the more recent OpenEA (Sun et al., 2020c), are set with the constraint that any entity to be aligned should have a valid counterpart. We use the full DBpedia (Lehmann et al., 2015) to build a new dataset and the key challenge lies in that we need to guarantee the selected dangling entities actually do not have counterparts. We first extract two subgraphs with one-to-one entity alignment (i.e., all entities have counterparts). Then, we randomly remove some entities to make their left counterparts in the peer KG dangling.
Although embedding-based entity alignment has been investigated for several years, handling with dangling entities has not been studied yet. As the third contribution, we present a multi-task learning framework for the proposed task ( §3). It consists of two jointly optimized modules for entity alignment and dangling entity detection, respectively. While the entity alignment module can basically incorporate any existing techniques from prior studies (Sun et al., 2020c), in this paper, we experiment with two representative techniques, i.e., relational embedding based (Chen et al., 2017) and neighborhood aggregation based (Sun et al., 2020b) methods. For dangling entity detection, our framework incorporates an auxiliary learning objective, which seeks to learn a confidence metric for the inferred entity alignment. The principle to realize such metric learning is that the embeddings of dangling entities should be isolated and are distant from others. According to this principle, we exploit several techniques to distinguish between matchable and dangling entities based on their distance distribution with their neighbors ( §3), including nearest neighbor classification, marginal ranking and background ranking (Dhamija et al., 2018).
We conduct comprehensive experiments on the new DBP2.0 dataset, which demonstrate the proposed techniques to solve the dangling entity detection problem to different extents. Moreover, we observe that training the dangling detection model (marginal ranking) provides an effective indirect supervision that improves the detection of alignment for matchable entities. We hope our task, dataset and framework can foster further investigation of entity alignment techniques in the suggested real scenario, leading to more effective and practical solutions to this challenging but important problem.

Task and Dataset
We hereby describe the problem setting of our task and introduce the new dataset.

Task Definition
A KG is a set of relational triples T ⊆ E × R × E, where E and R denote vocabularies of entities and relations, respectively. Without loss of generality, we consider entity alignment between two KGs, i.e., a source KG K 1 = (T 1 , E 1 , R 1 ) and a target KG K 2 = (T 2 , E 2 , R 2 ). Given a small set of seed entity alignment A 12 = {(e 1 , e 2 ) ∈ E 1 × E 2 e 1 ≡ e 2 } along with a small set of source entities D ⊂ E 1 known to have no counterparts as training data, the task seeks to find the remaining entity alignment. Different from the conventional entity alignment setting (Sun et al., 2017), a portion (with an anticipated quantity) of entities in E 1 and E 2 may have no counterparts. Our training and inference stages take such dangling entities into consideration.

Dataset Construction
As discussed, previous testbeds for entity alignment do not contain dangling entities (Sun et al., 2017;Chen et al., 2018;Sun et al., 2020c). Therefore, we first create a new dataset to support the study of the proposed problem setting. Same as the widely used existing benchmark DBP15K (Sun et al., 2017), we choose DBpedia 2016-10 1 as the raw data source. Following DBP15K, we also use English (EN), French (FR), Japanese (JA) and Chinese (ZH) versions of DBpedia to build three entity alignment settings of ZH-EN, JA-EN and FR-EN. For each monolingual KG, the triples are extracted from the Infobox Data of DBpedia, where relations are not mapped to a unified ontology. The reference entity alignment data is from the inter-language links (ILLs) of DBpedia across these three bridges of languages. Such reference data is later used as alignment labels for training and testing, and also serves as references to recognize dangling entities.

Construction.
The key challenge of building our dataset lies in that we need to ensure the selected dangling entities are indeed without counterparts. Specifcally, we cannot simply regard entities without ILLs as dangling ones, since the ILLs are also incomplete (Chen et al., 2017). Under this circumstance, we use a two-step dataset extraction process, which first samples two subgraphs whose entities all have counterparts based on ILLs, and randomly removes a disjoint set of entities in the source and target graphs to make their counterparts dangling. For the first step, we iteratively delete unlinked entities and their triples from the source and target KGs until the left two subgraphs are one-to-one aligned. In the second step for entity removal, while the removed entities are disjoint in two KGs, the proportion of the removed entities also complies with the proportion of unaligned entities in each KG.
Statistics and evaluation. Tab. 1 lists the statistics our dataset. The three entity alignment settings have different data scales and each is much larger than the same setting in DBP15K, thus can benefit better scalability analysis of models. For dangling entity detection, we split 30% of dangling entities for training, 20% for validation and others for test-ing. The splits of reference alignment follow the same partition ratio, which is also consistent with that of DBP15K to simulate the weak alignment nature of KGs (Chen et al., 2017;Sun et al., 2017). We also compare the degree distribution of matchable and dangling entities in our dataset against DBP15K in Fig. 7 of Appx. §A. We find the matchable and unlabeled entities in DBP15K have biased degree distribution, which has an adverse effect on dangling entity detection and leads to unreal evaluation. By contrast, in DBP2.0, matchable and dangling entities have similar degree distribution.

Entity Alignment with Dangling Cases
We propose a multi-task learning framework for entity alignment with dangling cases, as illustrated in Fig. 2. It has two jointly optimized modules, i.e., entity alignment and dangling entity detection. The entity alignment module takes as input relational triples of two KGs (for KG embedding) and seed entity alignment (for alignment learning). As for the detection of dangling entities, the module uses a small number of labeled dangling entities to jump-start the learning of a confidence metric for distinguishing between matchable and dangling entities. In the inference stage for entity alignment, our framework is able to first identify and remove dangling entities, then predict alignment for those that are decided to be matchable.

Entity Alignment
Our framework can incorporate any entity alignment technique. For the sake of generality, we consider two representative techniques in our framework. One technique is based on MTransE (Chen et al., 2017), which is among the earliest studies for embedding-based entity alignment. It employs the translational model TransE (Bordes et al., 2013) to embed KGs in separate spaces, meanwhile jointly learns a linear transformation between the embedding spaces to match entity counterparts. Specifically, given an entity pair (x 1 , x 2 ) ∈ A 12 , let x 1 and x 2 be their embeddings learned by the translational model. MTransE learns the linear transformation induced by a matrix M by minimizing Mx 1 −x 2 , where · denotes the L 1 or L 2 norm. The other technique is from AliNet (Sun et al., 2020b), which is one of the SOTA methods based on graph neural networks. AliNet encodes entities by performing a multi-hop neighborhood aggregation, seeking to cope with heteromorphism of  their neighborhood structures. For alignment learning, different from MTransE that only minimizes the transformed embedding distance, AliNet additionally optimizes a margin-based ranking loss for entity counterparts with negative samples. Specifically, let x be a matchable source entity in the seed entity alignment, and x is a randomly-sampled entity in the target KG, AliNet attempts to ensure

Dangling Entity Detection
We propose three techniques to implement the dangling detection module based on the distribution of the nearest neighbor distance in embedding space.

NN Classification
This technique is to train a binary classifier to distinguish between dangling entities (labeled 1, i.e., y = 1) and matchable ones (y = 0). Specifically, we experiment with a feed-forward network (FFN) classifier. Given a source entity x, its input feature representation is the difference vector between its embedding x and its transformed NN embedding x nn in the target KG embedding space 2 . The confidence of x being a dangling entity is given by p(y = 1|x) = sigmoid(FFN(Mx − x nn )). Let D be the training set of dangling source entities and A denotes the set of matchable entities in the training alignment data. For every x ∈ D∪A, we minimize the cross-entropy loss: where y x denotes the truth label for entity x. In a real-world entity alignment scenario, the dangling entities and matchable ones usually differ greatly in quantity, leading to unbalanced label distribution.
In that case, we apply label weights (Huang et al., 2016) to balance between the losses for both labels.

Marginal Ranking
Considering that dangling entities are the noises for finding entity alignment based on embedding distance, we are motivated to let dangling entities have solitary representations in the embedding space, i.e., they should keep a distance away from their surrounding embeddings. Hence, we seek to put a distance margin between dangling entities and their sampled NNs. For every input dangling entity x ∈ D, we minimize the following loss: where λ is a distance margin. This loss and the entity alignment loss (e.g., that of MTransE) conduct joint learning-to-rank, i.e., the distance between unaligned entities should be larger than that of aligned entities while dangling entities should have a lower ranking in the candidate list of any source entity.

Background Ranking
In the two aforementioned techniques, searching for the NN of an entity is time-consuming. Furthermore, selecting an appropriate value for the distance margin of the second technique is not trivial. Based on empirical studies, we find that the margin has a significant influence on the final performance. Hence, we would like to find a more efficient and self-driven technique. Inspired by the open-set classification approach (Dhamija et al., 2018) that lets a classifier equally penalize the output logits for samples of classes that are unknown to training (i.e. background classes), we follow a similar principle and let the model equally enlarge the distance of a dangling entity from any sampled target-space entities. This method is to treat all dangling entities as the "background" of the embedding space, since they should be distant from matchable ones. We also decrease the scale of the dangling entity embeddings to further provide a separation between the embeddings of matchable and dangling entities. For the dangling entity x ∈ D, let X v x be the set of randomly-sampled target entities with size of v. The loss is defined as where | · | denotes the absolute value and α is a weight hyper-parameter for balance. λ x is the average distance, i.e., This objective can push the relatively close entities away from the source entity without requiring a pre-defined distance margin.

Learning and Inference
The overall learning objective of the proposed framework is a combination of the entity alignment loss (e.g., MTransE's loss) and one of the dangling entity detection loss as mentioned above. The two losses are optimized in alternate batches. More training details are presented in §4.1.
Like the training phase, the inference phase is also separated into dangling entity detection and entity alignment. The way of inference for dangling entities differs with the employed technique. The NN classification uses the jointly trained FFN classifier to estimate whether the input entity is a dangling one. The marginal ranking takes the preset margin value in training as a confidence threshold, and decides whether an entity is a dangling one based on if its transformed NN distance is higher than the threshold. The inference of background ranking is similar to that of marginal ranking, with only the difference, by its design, to be that the confidence threshold is set as the average NN distance of entities in the target embedding space. After detecting dangling entities, the framework finds alignment in the remaining entities based on the transformed NN search among the matchable entities in the embedding space of the target KG.
Accelerated NN search. The first and second techniques need to search NNs. We can use an efficient similarity search library Faiss (Johnson et al., 2017) for fast NN retrieval in large embedding space. We also maintain a cache to store the NNs of entities backstage and update it every ten training epochs.

Experiments
In this section, we report our experimental results. We start with describing the experimental setups ( §4.1). Next, we separately present the experimentation under two different evaluation settings ( §4.2- §4.3), followed by an analysis on the similarity score distribution of the obtained representations for matchable and dangling entities ( §4.4). To faciliate the use of the contributed dataset and software, we have incorporated these resources into the Ope-nEA benchmark 3 (Sun et al., 2020c).

Experimental Settings
We consider two evaluation settings. One setting is for the proposed problem setting with dangling entities, for which we refer as the consolidated  evaluation setting. We first detect and remove the dangling source entities and then search alignment for the left entities. For this evaluation setting, we also separately assess the performance of the dangling detection module. The other simplified setting follows that in previous studies (Sun et al., 2017(Sun et al., , 2020c where the source entities in test set all have counterparts in the target KG, so no dangling source entities are considered. In this relaxed evaluation setting, we seek to evaluate the effect of dangling entity detection on entity alignment and make our results comparable to previous work. Evaluation Protocol. For the relaxed evaluation setting, given each source entity, the candidate counterpart list is selected via NN search in the embedding space. The widely-used metrics on the ranking lists are Hits@k (k = 1, 10, H@k for short) and mean reciprocal rank (MRR). Higher H@k and MRR indicate better performance. For the consolidated setting, we report precision, recall and F1 for dangling entity detection. As for assessing the eventual performance of realistic entity alignment, since the dangling entity detection may not be perfect. it is inevitable for some dangling entities to be incorrectly sent to the entity alignment module for aligning, while some matchable ones may be wrongly excluded. In this case, H@k and MRR are not applicable for the consolidated entity alignment evaluation. Following a relevant evaluation setting for entity resolution in database (Mudgal et al., 2018;Ebraheem et al., 2018), we also use precision, recall and F1 as metrics. More specifically, if a source entity is dangling and is not identified by the detection module, the prediction is always regarded as incorrect. Similarly, if a matchable entity is falsely excluded by the dangling detection module, this test case is also regarded as incorrect since the alignment model has no chance to search for alignment. Otherwise, the alignment module searches for the NN of a source entity in the target embedding space and assesses if the predicated counterpart is correct.
Model Configuration. As described in §3.2, our dangling detection module has three variants, i.e.,   NN classification (NNC), marginal ranking (MR), and background ranking (BR). We report the implementation details of the entity alignment module (w/ MTransE or AliNet) in Appendices B and C. We initialize KG embeddings and model parameters using the Xavier initializer (Glorot and Bengio, 2010), and use Adam (Kingma and Ba, 2015) to optimize the learning objectives with the learning rate 0.001 for MTransE and 0.0005 for AliNet. Note that we do not follow some methods to initialize with machine translated entity name embeddings (Wu et al., 2020a). As being pointed out by recent studies (Chen et al., 2021;Liu et al., 2021Liu et al., , 2020, this is necessary to prevent test data leakage. Entity similarity is measured by cross-domain similarity local scaling (Lample et al., 2018) for reduced hubness effects, as being consistent to recent studies (Sun et al., 2020b;Chen et al., 2021). We use a twolayer FFN in NNC. For MR, the margin is set as λ = 0.9 for MTransE and 0.2 for AliNet. BR randomly samples 20 target entities for each entity per epoch and α = 0.01. Training is terminated based on F1 results of entity alignment on validation data.

Relaxed Evaluation
We first present the evaluation under the relaxed entity alignment setting based on Tab. 2. This setting only involves matchable source entities to test entity alignment, which is an ideal (but less realistic) scenario similar to prior studies (Sun et al., 2020c).
We also examine if jointly learning to detect dangling entities can indirectly improve alignment. As observed, MTransE, even without dangling detection, can achieve promising performance on DBP2.0. The results are even better than those on DBP15K as reported by Sun et al. (2017). We attribute this phenomenon to the robustness of this simple embedding method and our improved implementation (e.g., more effective negative sampling). By contrast, although we have tried our best in tuning, the latest GNN-based AliNet falls behind MTransE. Unlike MTransE that learns entity embeddings from a first-order perspective (i.e., based on triple plausibility scores), AliNet represents an entity from a high-order perspective by aggregating its neighbor embeddings, and entities with similar neighborhood structures would have similar representations. However, the dangling entities in DBP2.0 inevitably become spread noises in entity neighborhoods. To further probe into this issue, we count the average neighbor overlap ratio of aligned entities in DBP15K and our DBP2.0. Given an entity alignment pair (x 1 , x 2 ), let π(x 1 ) and π(x 2 ) be the sets of their neighboring entities respectively, where we also merge their aligned neighbors as one identity based on reference entity alignment. Then the neighbor overlap ratio of x 1 and x 2 is calculated as |π(x 1 ) ∩ π(x 2 )|/|π(x 1 ) ∪ π(x 2 )|. We average such a ratio for both DBP15K and DBP2.0 as given in Fig. 3. We can see that the three settings'  NNC .164 .215 .186 .118 .207 .150 .180 .238 .205 .101 .167 .125 .185 .189 .187 .135 .140 .138 MR .302 .349 .324 .231 .362 .282 .313 .367 .338 .227 .366 .280 .260 .220 .238 .213 .224 .218 BR .312 .362 .335 .241 .376 .294 .314 .363 .336 .251 .358 .295 .265 .208 .233 .231 .213 .222 AliNet NNC .121 .193 .149 .085 .138 .105 .113 .146 .127 .067 .208 .101 .126 .148 .136 .086 .161 .112 MR .207 .299 .245 .159 .320 .213 .231 .321 .269 .178 .340 .234 .195 .190 .193 .160 .200 .178 BR .203 .286 .238 .155 .308 .207 .223 .306 .258 .170 .321 .222 .183 .181 .182 .164 .200 .180   overlap ratios in DBP2.0 are all much lower than those in DBP15K. Thus, DBP2.0 poses additional challenges, as compared to DBP15K, specifically for those methods relying on neighborhood aggregation. Based on results and analysis, we argue that methods performing well on the previous synthetic entity alignment dataset may not robustly generalize to the more realistic dataset with dangling cases. The performance of both MTransE and AliNet is relatively worse on FR-EN, which has more entities (i.e., larger candidate search space) and a low neighborhood overlap ratio (therefore, more difficult to match entities based on neighborhood similarity). Meanwhile, we find that the dangling detection module can affect the performance of entity alignment. In details, MR consistently leads to improvement to both MTransE and AliNet. BR can also noticeably boost entity alignment on most settings. This shows that learning to isolate dangling entities from matchable ones naturally provides indirect help to discriminate the counterpart of a matchable entity from irrelevant ones. On the other hand, such indirect supervision signals may be consumed by the additional trainable parameters in NNC, causing its effect on entity alignment to be negligible. Overall, the observation here calls for more robust entity alignment methods and dangling detection techniques, and lead to further analysis ( §4.3).

Consolidated Evaluation
We now report the experiment on the more realistic consolidated evaluation setting. Tab. 3 gives the precision, recall and F1 results of dangling entity detection, and the final entity alignment performance is presented in Tab. 4. In addition, Fig. 4  shows the accuracy of dangling entity detection. We analyze the results from the following aspects.
Dangling entity detection. Regardless of which alignment module is incorporated, NNC performs the worst (e.g., the low recall and accuracy around 0.5) among the dangling detection techniques, whereas BR generally performs the best. NNC determines whether an entity is dangling based on the difference vector of the entity embedding and its NN, instead of directly capturing the embedding distance which is observed to be more important based on the results by the other two techniques. By directly pushing dangling entities away from their NNs in the embedding space, both MR and BR offer much better performance. Besides, BR outperforms MR in most cases. By carefully checking their prediction results and the actual distance of NNs, we find that the induced distance margin in BR better discriminates dangling entities from matchable ones than the pre-defined margin.
Efficiency. We compare the average epoch time of training the three dangling detection modules for MTransE in Fig. 5. We conduct the experiment using a workstation with an Intel Xeon E5-1620 3.50GHz CPU and a NVIDIA GeForce RTX 2080 Ti GPU. Since NNC and MR need to search for NNs of source entities, both techniques spend much more training time that is saved by random sampling in BR. Overall, BR is an effective and efficient technique for dangling entity detection.
Entity alignment. Generally, for both MTransE and AliNet variants, MR and BR lead to better entity alignment results than NNC. MR and BR Figure 6: Kernel density estimate plot of the test matchable and dangling entities' similarity distribution with their nearest target neighbors in ZH-EN.
obtain higher precision and recall performance on detecting dangling entities as listed in Tab. 3, resulting in less noise that enters the entity alignment stage. By contrast, NNC has a low accuracy and thus introduces many noises. As BR outperforms MR in dangling detection, it also achieves higher entity alignment results than MR on most settings. We also notice that MR in a few settings, MR offer comparible or slightly better performance than BR. This is because MR can enhance the learning of alignment modules (see §4.2 for detailed analysis), thus delivering improvement to the final performance. MTransE variants generally excels AliNet variants in both entity alignment (see Tab. 2) and dangling entity detection (see Tab. 3) than AliNet, similar to the observation in §4.2.
Alignment direction. We find that the alignment direction makes a difference in both dangling entity detection and entity alignment. Using EN KG as the source is coupled with easier dangling detection than in other languages, as the most populated EN KG contributes more dangling entities and triples to training than other KGs. As for entity alignment, we find the observation to be quite the opposite, as using the EN KG as a source leads to noticeable drops in results. For example, the precision of MTransE-BR is 0.312 on ZH-EN, but only 0.241 on EN-ZH. This is because the EN KG has a larger portion of dangling entities. Although the dangling detection module performs well on the EN KG than on others, there are still much more dangling entities entering the alignment search stage, thus reducing the entity alignment precision. This observation suggests that choosing the alignment direction from a less populated KG to the more populated EN KG can be a more effective solution.

Similarity Score Distribution
To illustrate how well the BR technique distinguishes between matchable and dangling entities, we plot in Fig. 6 the distribution of similarity scores of each test entity and its NN. The plot illustrates BR has the expected effect to isolate dangling entities from their NNs, whereas matchable entities are generally placed closer to their NNs. Yet, we can still see a modest overlap between the two NN similarity distributions of dangling and matchable entities, and a number of dangling entities still have a quite large NN similarity. This also reveals the fact that the proposed problem setting of entity alignment with dangling cases has many remaining challenges that await further investigation.

Related Work
We discuss two topics of relevant work.

Entity Alignment
Embedding-based entity alignment is first attempted in MTransE (Chen et al., 2017), which jointly learns a translational embedding model and a transform-based alignment model for two KGs.
Later studies generally follow three lines of improvement. (i) The first line improves the embedding technique to better suit the alignment task, including contextual translation techniques , long-term dependency techniques  and neighborhood aggregation (or GNN-based) ones (Wang et al., 2018;Sun et al., 2020b,a;Fey et al., 2020). (ii) The second line focuses on effective alignment learning with limited supervision. Some leverage semi-supervised learning techniques to resolve the training data insufficiency issue, including self-learning (Sun et al., 2018;Mao et al., 2020) and co-training (Chen et al., 2018). (iii) Another line of research seeks to retrieve auxiliary or indirect supervision signals from profile information or side features of entities, such as entity attributes (Sun et al., 2017;Trisedya et al., 2019;Pei et al., 2019), literals (Wu et al., 2019(Wu et al., , 2020bLiu et al., 2020), free text (Chen et al., 2021), pre-trained language models (Yang et al., 2019;Tang et al., 2020) or visual modalities (Liu et al., 2021). Due to the large body of recent advances, we refer readers to a more comprehensive summarization in the survey (Sun et al., 2020c).

Learning with Abstention
Learning with abstention is a fundamental machine learning, where the learner can opt to abstain from making a prediction if without enough decisive confidence (Cortes et al., 2016(Cortes et al., , 2018. Related techniques include thresholding softmax (Stefano et al., 2000), selective classification (Geifman and El-Yaniv, 2017), open-set classification with background classes (Dhamija et al., 2018) and out-ofdistribution detection Vyas et al., 2018). The idea of learning with abstention also has applications in NLP, such as unanswerable QA, where correct answers of some questions are not stated in the given reference text (Rajpurkar et al., 2018;Zhu et al., 2019;.
To the best of our knowledge, our task, dataset, and the proposed dangling detection techniques are the first contribution to support learning with abstention for entity alignment and structured representation learning.

Conclusion and Future Work
In this paper, we propose and study a new entity alignment task with dangling cases. We construct a dataset to support the study of the proposed problem setting, and design a multi-learning framework for both entity alignment and dangling entity detection. Three types of dangling detection techniques are studied, which are based on nearest neighbor classification, marginal ranking, and background ranking. Comprehensive experiments demonstrate the effectiveness of the method, and provide insights to foster further investigation on this new problem. We further find that dangling entity detection can, in turn, effectively provide auxiliary supervision signals to improve the performance of entity alignment.
For future work, we plan to extend the benchmarking on DBP2.0 with results from more base models of entity alignment as well as more abstention inference techniques. Extending our framework to support more prediction tasks with abstention, such as entity type inference (Hao et al., 2019) and relation extraction (Alt et al., 2020), is another direction with potentially broad impact.

Appendices
A Degree Distribution Fig. 7 shows the degree distribution of the matchable and dangling entities in our dataset against DBP15K. Although DBP15K contains some entities that are not labeled to have counterparts, by checking the ILLs in the recent update of DBpedia, we find many of these entities to have counterparts in the target KG. Hence, these entities in DBP15k cannot act as dangling entities that are key to the more realistic evaluation protocol being proposed in this work. From the comparison, we can see that these unlabeled entities in DBP15K have much fewer triples than matchable entities. This biased degree distribution will have an adverse effect on dangling entity detection and lead to unreal evaluation. By contrast, in our dataset, matchable and dangling entities have similar degree distribution.

B Configuration of MTransE and AliNet
For entity alignment, we experiment with MTransE (Chen et al., 2017) and the SOTA method AliNet (Sun et al., 2020b). The implementation of our framework is extended based on OpenEA (Sun et al., 2020c). We adopt the truncated negative sampling method by BootEA (Sun et al., 2018) to generate negative triples for MTransE and negative alignment links for AliNet, which leads to improved performance. The embedding size is

C Hyper-parameter Settings
We select each hyper-parameter setting within a wide range of values as follows: •  Fig. 8 gives the recall@10 results of the MTransE variants with dangling entity detection in the consolidated evaluation setting. We can see that the recall@10 results on FR-EN are lower than that on ZH-EN and JA-EN, which is similar to the observation in entity alignment §4.3. From the results, we think existing embedding-based entity alignment methods are still far from being usable in practice.