Cross-lingual Entity Alignment with Incidental Supervision

Much research effort has been put to multilingual knowledge graph (KG) embedding methods to address the entity alignment task, which seeks to match entities in different languagespecific KGs that refer to the same real-world object. Such methods are often hindered by the insufficiency of seed alignment provided between KGs. Therefore, we propose a new model, JEANS , which jointly represents multilingual KGs and text corpora in a shared embedding scheme, and seeks to improve entity alignment with incidental supervision signals from text. JEANS first deploys an entity grounding process to combine each KG with the monolingual text corpus. Then, two learning processes are conducted: (i) an embedding learning process to encode the KG and text of each language in one embedding space, and (ii) a self-learning based alignment learning process to iteratively induce the correspondence of entities and that of lexemes between embeddings. Experiments on benchmark datasets show that JEANS leads to promising improvement on entity alignment with incidental supervision, and significantly outperforms state-of-the-art methods that solely rely on internal information of KGs.

Learning to align multilingual KGs is a nontrivial task, as KGs with distinct surface forms, heterogeneous schemata and inconsistent structures easily cause traditional symbolic methods to fall short (Suchanek et al., 2011;Wijaya et al., 2013;Jiménez-Ruiz et al., 2012). Recently, much attention has been paid to methods based on multilingual KG embeddings (Chen et al., 2017a(Chen et al., ,b, 2018Sun et al., 2017Sun et al., , 2018Sun et al., , 2019b. Those methods seek to separately encode the structure of each language-specific KG in an embedding space. Then, based on some seed entity alignment, the entity counterparts in different KGs can be easily matched via distances or transformations of embedding vectors. The principle is that entities with relevant neighborhood information can be characterized with similar embedding representations. Such representations particularly are tolerant to the aforementioned heterogeneity of surface forms and schemata in language-specific KGs (Chen et al., 2017a;Sun et al., 2018Sun et al., , 2020a. While multilingual KG embeddings provide a general and tractable way to align KGs, it still remains challenging for related methods to precisely infer the cross-lingual correspondence of entities. The challenge is that the seed entity alignment, which serves as the essential training data to learn the connection between language-specific KG embeddings, is often limitedly provided in KBs (Chen et al., 2018;Sun et al., 2018). Hence, the lack of supervision often hinders the precision of inferred entity counterparts, and affects even more significantly when KGs scale up and become inconsistent in contents and density (Pujara et al., 2017). Several methods also gain auxiliary supervision from profile information of entities, including descriptions (Chen et al., 2018;Yang et al., 2019; and numerical attributes (Sun et al., 2017;Trsedya et al., 2019). However, such profile information is not available in many KGs (Speer et al., 2017;Mitchell et al., 2018;Bond and Foster, 2013), therefore causing these methods to be not generally applicable.
Unlike existing models that rely on internal information of KGs, we seek to create embeddings that incorporate both KGs and freely available text corpora, and exploit incidental supervision signals (Roth, 2017) from text corpora to enhance the alignment learning on KGs ( Figure  1). In this paper, we propose a novel embedding model JEANS (Joint Embedding Based Entity Alignment with INcidental Supervision; ). Particularly, JEANS first performs a grounding process (Gupta et al., 2017;Upadhyay et al., 2018) to link entity mentions in each monolingual text corpus to the KG of the same language. Based on the KGs and grounded text in a pair of languages, JEANS conducts two learning processes, i.e. embedding learning and alignment learning. The embedding learning process distributes entities, relations and lexemes of each language in its embedding space, in which a KG embedding model and a language model for that language are jointly trained. This process seeks to leverage text contexts to help capture the proximity of entities. On top of that, alignment learning captures cross-lingual correspondence for entities and lexemes in a self-learning manner (Artetxe et al., 2018). Starting from a small amount of seed entity alignment, this process iteratively induces a transformation between language-specific embedding spaces, and infers more alignment of entity and lexemes at each iteration to improve the learning at the next one. Moreover, we also employ the closed-form Procrustes solution (Conneau et al., 2018) to strengthen alignment induction within each iteration. Experimental results on two benchmarks confirm the effectiveness of JEANS in leveraging incidental supervision, leading to significant improvement to entity alignment and drastically outperforming existing methods.

Related Work
We discuss relevant works in four topics. Each of them has a large body of work which we can only provide as a highly selected summary.
Entity alignment. Entity alignment in KBs has been a long-standing problem (Shvaiko and Euzenat, 2011). Aside from earlier approaches based on symbolic or schematic similarity of entities (Suchanek et al., 2011;Wijaya et al., 2013;Jiménez-Ruiz et al., 2012), more recent research addresses this task with multilingual KG embeddings. A representative method of such is MTransE (Chen et al., 2017a). MTransE jointly learns two model components. There are a translational embedding model (Bordes et al., 2013) that distributes the facts in language-specific KGs into separate embeddings, and a transformation-based alignment model that maps between entity counterparts across embedding spaces.
Following the general principle of MTransE, later approaches are developed through the following three lines. One is to incorporate various embedding learning techniques for KGs. Besides translational techniques, some models employ alternative relation modeling techniques to encode relation facts, such as circular correlation (Nickel et al., 2016), Hadamard product (Hao et al., 2019) and recurrent skipping networks . Others encode entities with neighborhood aggregation techniques, including GCN Yang et al., 2019;Cao et al., 2019;Wu et al., 2019b), RGCN (Wu et al., 2019a) and GAT (Zhu et al., 2019). Their benefits are mainly to produce entity representations capturing high-order proximity, so as to better suit the alignment task. A few works follow the second line to enhance the alignment learning with semi-supervised learning techniques. Representative ones include co-training (Chen et al., 2018), optimal transport (Pei et al., 2019b) and bootstrapping (Sun et al., 2018), which improve the preciseness of alignment captured with limited supervision. The third line of research seeks to obtain additional supervision from entity profiles, including descriptions (Chen et al., 2018;Yang et al., 2019), attributes (Sun et al., 2017;Trsedya et al., 2019;Pei et al., 2019a;Yang et al., 2020) and KG schemata . While those alternative views of entities can effectively bridge the embeddings, the limitation of such methods lies in the unavailability of those views in many KGs (Speer et al., 2017;Mitchell et al., 2018;Bond and Foster, 2013). A survey on the entity alignment problem by Sun et al. (2020b) has provided a more comprehensive summarization of recent advances in these lines.
Our method is mainly related to the third line of research. While instead of leveraging specific intra-KB information, our method introduces supervision signals from text contexts that are freely accessible to almost any KBs with the aid of grounding techniques. Meanwhile, our paper also follows the second line to improve alignment learning techniques, and couples two mainstream techniques for embedding learning.
Joint embeddings of entities and text. Fewer efforts have been put to jointly characterize entities and text as embeddings. Wang et al. (2014b) propose to connect a translational embedding of Freebase (Bollacker et al., 2008) to a English word embedding based on Wikipedia anchors, therefore providing a joint embedding to enhance link prediction in the KG. Zhong et al. (2015) generalize the approach by Wang et al. (2014b) with distant supervision based on entity descriptions and text corpora. Toutanova et al. (2015) extract dependency paths from sentences and jointly embed them with a KG using DistMult (Yang et al., 2015) to support the relation extraction task. Several other approaches focus on jointly embedding words, entities (Yamada et al., 2017;Newman-Griffis et al., 2018;Cao et al., 2017;Almasian et al., 2019) and entity types (Gupta et al., 2017) appearing in the same textual contexts without considering relational structure of a KG. These ap-proaches are employed in monolingual NLP tasks including entity linking (Gupta et al., 2017;Cao et al., 2017), entity abstraction (Newman-Griffis et al., 2018) and factoid QA (Yamada et al., 2017). As they focus on a monolingual and supervised scenario, they are essentially different from our goal to help cross-lingual KG alignment with incidental supervision from unparalleled corpora.
Multilingual word embeddings. Our model component of alignment induction from text is closely connected to multilingual word embeddings. Earlier approaches in this line, regardless of being supervised or weakly supervised, based on seed lexicon (Zou et al., 2013) or parallel corpora (Gouws et al., 2015), are systematically summarized in a recent survey (2017). While a number of methods in this line can be employed in our model to gain addition supervision for entity alignment, we choose to use a combination of Procrustes solution (Conneau et al., 2018) with selflearning to offer precise inference of cross-lingual alignment based on limited seed alignment. Note that recent contextualized embeddings such M-BERT (Pires et al., 2019) and XLM-R (Conneau et al., 2020) do not directly suit our problem setting, since contextualization could cause ambiguity to entity representations, therefore impairing the alignment of entities across embedding spaces.
Incidental supervision. Incidental supervision is a recently introduced learning strategy (Roth, 2017), which seeks to retrieve supervision signals from data that are not labeled for the target task. This strategy has been applied to tasks including SRL (He et al., 2020), controversy prediction (Rethmeier et al., 2018) and dataless classification (Song and Roth, 2015). To the best of our knowledge, the proposed method here is the first of its kind that incorporates incidental supervision in embedding learning or alignment.

Method
We hereby begin introducing our method with the formalization of learning resources.
In a KB, L denotes the set of languages, and L 2 unordered language pairs. G L is the languagespecific KG of language L ∈ L. E L and R L respectively denote the corresponding vocabularies of entities and relations. T = (h, r, t) denotes a triple in G L such that h, t ∈ E L and r ∈ R L . Boldfaced h, r, t represent the embedding vectors of head h, relation r, and tail t respectively. For a language pair (L 1 , L 2 ) ∈ L 2 , I E (L 1 , L 2 ) denotes a set of entity alignments between L 1 and L 2 , such that e 1 ∈ E L 1 and e 2 ∈ E L 2 for each entity pair (e 1 , e 2 ) ∈ I E (L 1 , L 2 ). Following the convention of previous work (Chen et al., 2018;Sun et al., 2018;Yang et al., 2019), we assume the entity pairs to have a 1-to-1 mapping and it is specified in I E (L 1 , L 2 ). This assumption is congruent to the design of mainstream KBs (Lehmann et al., 2015;Mahdisoltani et al., 2015) where disambiguation of entities is granted. Besides the definition of multilingual KGs, we use D L to denote the text corpus of language L. D L is a set of documents is a sequence of tokens from the monolingual lexicon W L . Each token w i thereof is originally a lexeme, but may also be an entity surface form after the ground process, and we also use boldfaced w i to denote its vector. I W (L 1 , L 2 ) denotes the seed lexicon between (L 1 , L 2 ), such that w 1 ∈ W L 1 and w 2 ∈ W L 2 for each lexeme pair (w 1 , w 2 ) ∈ I W (L 1 , L 2 ). Note that I W only include the alignment between lexemes, and may optionally serve as external supervision data. To be consistent with previous problem settings of entity alignment (Chen et al., 2017a;Sun et al., 2018;Yang et al., 2019), I W is not necessarily provided to training, but is defined to be compatible with the scenarios where it is available.
JEANS addresses entity alignment in three consecutive processes. (i) A grounding process first link entities of each KG to possible mentions of them in the corresponding monolingual corpus, therefore connecting entities and text tokens of the same language into a shared vocabulary. (ii) An embedding learning process characterizes the KG and text of each language into a separate embedding space. In this process, we couple both the translational technique (Bordes et al., 2013;Chen et al., 2017aChen et al., , 2018 and the neighborhood aggregation technique Yang et al., 2019), which are two representative techniques to characterize a KG. Simultaneously, the monolingual text tokens are encoded with a skip-gram language model (Mikolov et al., 2013). (iii) On top of the embeddings, starting from a small amount of seed entity alignment and optional seed lexicon, the alignment learning process iteratively infers more alignment both on KGs and text using selflearning and Procrustes solution (Schönemann, 1966). The processes of JEANS's learning is consistent to Figure 1. The rest of this section introduces the technical details of each process.

(Noisy) Entity Grounding
The goal of the grounding process is to combine vocabularies of the KG and the text corpus in each language. This serves as the premise for the embedding learning process to produce a shared representation scheme for entities, relations and lexemes, therefore allowing the alignment learning process to leverage supervision signals for both entities and lexemes. It is noteworthy that, the purpose of entity grounding here is to combine the two data modalities. Hence, we only expect this process to discover enough entity contexts and offer a higher coverage on entity vocabularies, while being tolerant to possible noise in entity recognition and linking. Particularly, we consider two grounding techniques, one using a pre-trained entity discovery and linking (EDL) model, the other based on simple surface form matching (SFM).
Pre-trained EDL model. One technique is to use off-the-shelf EDL models (Khashabi et al., 2018;Manning et al., 2014). A typical model of such sequentially handles the steps of NER to detect entity mentions, and link each mention to candidate entities from the KG based on symbolic and contextual similarity. Many EDL models are easily trainable on large text corpora with anchors, and offer promising performance of grounding and disambiguation on multiple languages (Sil et al., 2018). In this paper, we do not go into details to the design of EDL models. Interested readers are referred to the aforementioned literature.
Surface form matching. Suppose a pre-trained EDL model is not available, then a simpler way of combining data is to match KG surface forms with text. This can be efficiently done by building a Completion Trie (Hsu and Ottaviano, 2013) for multi-token surface forms, and conducting a longest prefix matching (Dharmapurikar et al., 2006) between surface forms and sub-sequences of text tokens. While this simple technique does not necessarily disambiguate entity mentions, experiments find it sufficient to combine the two modalities and allow supervision signals from induced lexical alignment to propagate to entities.
Once the entity vocabulary E L and the lexicon W L of a language are combined, we assume that entity mentions in D L are properly tokenized as grounded surface forms in E L ∩ W L . Specifically, we now use x to denote a token in the grounded corpus D L that can either be an entity e or a lexeme w. Given the combined learning resources for each language, we next describe the processes of embedding learning and alignment learning.

Embedding Learning
The embedding learning process is responsible for capturing the combined KG and text corpus of each language in a shared embedding space R k . In this process, JEANS jointly learns two model components to respectively encode units of the KG and the text, among which the overlaps E L ∩ W L use shared representations. We hereby describe these two model components in detail.

KG Embedding
As discussed in §2, previous approaches respectively leverage two forms of embedding learning techniques: (i) relation modeling (Chen et al., 2017a;Sun et al., 2018) such as vector translations, circular correlation and Hadamard product seeks to capture relations as an arithmetic operation in the vector space; (ii) neighborhood aggregation Yang et al., 2019;Cao et al., 2019) employs graph neural networks (GNN) to encode neighborhood contexts for better seizing the proximity of entities.
The KG embedding model proposed in this work couples both forms of techniques. This aims at seizing both relations and entity proximity, two factors that are both beneficial to produce transferable entity embeddings. To achieve this goal, the encoder first stacks n layers of GCN (Kipf and Welling, 2016) on the KG. Formally, the l-th layer representation E (l) is computed as where D is the diagonal degree matrix D of the KG,Ã = A + I is the sum of the adjacency matrix A and an identity I, and M (l−1) is a trainable weight matrix. The raw features of entities E (0) can be either entity attributes or randomly initialized. The last layer outputs are regarded as entity embedding representations, i.e. E = E (n) .
We use E L to denote the entity representations of language L, then the following log-softmax loss is optimized to perform relational modeling with translation vectors in the embedding space of L: where f r (h, t) = h + r − t is the plausibility measure of a triple (Bordes et al., 2013), T = (ĥ, r,t) is a Bernoulli negative-sampled triple (Wang et al., 2014a) created by substituting either head or tail entities h or t in T = (h, r, t). b is a positive bias to adjust the scale of the plausibility measure. All the entity representations optimized in S K L are from E L . Note that the reason for us to choose the translational technique over other relation modeling techniques is due to this technique being more robust in cases where KG structures are sparser (Pujara et al., 2017).

Text Embedding
In addition to the KG embedding, the text embedding seeks to leverage the contextual information of free text to help the embedding better capture the proximity of entities This model employs the continuous skip-gram language model, which is inline with a number of word embedding methods (Mikolov et al., 2013;Bojanowski et al., 2017;Conneau et al., 2018), and is realized by optimizing the following log-softmax loss: .
The text context C x,D L thereof is the set of tokens that surround a token x in the entity-grounded corpus D L , d denotes the l 2 distance, and x n denotes a randomly sampled token in E L ∪ W L .

Embedding Learning Objective
For each language L ∈ L, the goal of embedding learning is to optimize the joint loss As mentioned, the grounded entity surface forms in E L ∩ W L use shared representations in both model components, hence are optimized with both S K L and S T L . The rest lexeme, relation and entity representations are optimized alternately by either component. In both model components, the number of negative samples of triples and tokens are both adjustable hyperparameters.
It is noteworthy that, both model components may choose alternative techniques, including other KG encoders such as GAT (Veličković et al., 2018), multi-channel GCN (Cao et al., 2019) and gated GNN (Sun et al., 2020a), and text embeddings such as GloVe (Pennington et al., 2014). As experimenting with different embedding techniques is not a main contribution of this work, we leave them as future work. Specifically, contextualized text representations (Peters et al., 2018;Devlin et al., 2019) cannot directly apply, as contextualization will cause ambiguity to token representations that hinder the match of embeddings.

Alignment Learning
Once the KG and text units of each language are captured in a shared embedding, the alignment learning process therefore bridges the alignment between each pair of embeddings. This process seeks to exploit additional alignment labels from text embeddings, and use those to help the alignment of entities. Different from the majority of methods in §2 that jointly learn embeddings and alignment, the alignment learning process in JEANS is a retrofitting process Faruqui et al., 2015). Hence, the embedding of each language is fixed and does not require duplicate training for different language pairs (Chen et al., 2017a;Sun et al., 2017).
Given a pair of languages (L i , L j ) ∈ L 2 , the objective of alignment learning is to induce a transformation M ij ∈ R k×k between the two embedding spaces. The following loss is minimized , and the word seed lexicon I W is considered additional supervision data that are optionally provided. Each x i (x j ) denotes a fixed representation of either an entity or a lexeme of L i (L j ).
Starting from a small amount of seed alignment in I(L i , L j ), JEANS conducts an iterative selflearning process to exploit more alignment labels for both entities and lexemes to improve the learning of M ij . In each iteration, we follow Conneau et al. (2018) to induce a Procrustes solution for M ij . To propose new alignment labels, the self-learning technique in JEANS deploys a mutual nearest neighbor (NN) constraint, which requires a suggested pair of matched items to appear in the NN of each other. More specifically, define , and x j mutually appears in N 1 L j (M ij x i ). Besides, we also require (x i , x j ) to be of the same type, i.e. both being entities or being lexemes. Particularly, we only select entities that have not been aligned in I to form the newly-proposed (x i , x j ). This respects the 1-to-1 matching constraint of entities being defined at the beginning of this section, and effectively reduces the candidate space after each iteration of self-learning. Meanwhile, 1-to-1 matching is not required for lexemes. To mitigate hubness, we also follow Conneau et al. (2018) to employ the Crossdomain Similarity Local Scaling (CSLS) measure.
After the iteration, the newly proposed alignment labels are inserted to I to enhance the learning at the next iteration. The iterative self-learning is stopped once the number of proposed entity alignment in an iteration is below certain quantity (e.g. 1% of |E L i |). With more and more matched entities and lexemes being exploited within each iteration, a better M ij is induced, whereas the lexical alignment naturally serve as incidental supervision signals for entity alignment.
After the alignment learning process, given a query (e i , ?e j ) to find the counterpart entity of e i ∈ E L i from E L j , the answer e j is predicted as the 1-NN entity after applying M ij to transform e i , denoted {e j } = N 1 The inference phase by default also adopts CSLS as the distance measure, which is consistent with the default setting of recent works (Sun et al., 2019b(Sun et al., , 2020a.

Experiment
In this section, we evaluate JEANS on two benchmark datasets for cross-lingual entity alignment, and compare against a wide selection of recent baseline methods. We also provide detailed ablation study on model components of JEANS.

Datasets.
Experiments are conducted on DBP15k (Sun et al., 2017) and WK3l60k (Chen et al., 2018) that are widely used benchmarks on the studied task. DBP15k contains four language-specific KGs that are respectively extracted from English (En), Chinese (Zh), French (Fr) and Japanese (Ja) DBpedia (Lehmann et al., Settings DBP15k  Table 1: Entity alignment results. Baselines are separated in accord with the three groups described in Section 4.1. † indicates results obtained from (Sun et al., 2020a), and ‡ indicates those from (Pei et al., 2019b). 2015), each of which contains around 65k-106k entities. Three sets of 15k alignment labels are constructed to align entities between each of the other three languages and English. WK3l60k contains larger KGs with around 57k to 65k entities in En, Fr and German (De) KGs, and around 55k reference entity alignment for En-Fr and En-De settings. Dataset statistics are given in Appendix §A.2 (Chen et al., 2021). We also use the text of Wikipedia dumps (dated Jan 01, 2019) in the five participating languages in training. For Chinese and Japanese corpora thereof, we obtain the segmented versions respectively from PKUSEG (Luo et al., 2019) and MeCab (Kudo, 2006).
Baseline methods. We compare with a wide selection of recent approaches for entity alignment on multilingual KGs. The baseline methods include (i) those employing different structure embedding techniques, namely MTransE (Chen et al., 2017a), GCN-Align , AlignE (Sun et al., 2018), GCN-JE (Wu et al., 2019b), KECG (Li et al., 2019a), MuGCN (Cao et al., 2019), RotatE (Sun et al., 2019c), RSN  and AliNet (Sun et al., 2020a); (ii) methods that incorporate auxiliary information of entities, namely JAPE (Sun et al., 2017), SEA (Pei et al., 2019a), GMN  and HMAN (Yang et al., 2019); (iii) semi-supervised alignment learning methods, including BootEA (Sun et al., 2018), KDCoE (Chen et al., 2018), MMR (Shi and Xiao, 2019), NAEA (Zhu et al., 2019) and OTEA (Pei et al., 2019b). Descriptions of these methods are given in Appendix §A.1 (Chen et al., 2021). Note that some works have allowed to incorporate extra cross-lingual signals such as machine translation in training, or using pre-aligned word embeddings to delimit candidate spaces (Wu et al., 2019a,b;. For example, Wu et al. (2019a,b) used Google Translate to translate surface forms of entities in all other languages to English, and initialize the entity embeddings in their model with pre-trained word embedding of translated entity names. Results for these models are reported for the versions where the extra crosslingual alignment information is removed so as to conduct fair comparison with all the rest models that are trained from scratch and using the same alignment labels in the benchmark datasets. This also necessarily prevents potential leakage of testing data to training , considering that training a comprehensive NMT system may have subsumed many of the testing data in the entity alignment benchmarks.
Evaluation protocols. The use of the datasets are consistent with previous studies of the baseline methods. On each language pair in DBP15k, around 30% of seed alignment is used for training, the rest for testing. On WK3l60k, 20% of seed alignment on En-Fr and En-De settings is respectively used for training. Following the convention, we calculate several ranking metrics on test cases, including the accuracy H @1, the proportion of cases that are ranked no larger than p H @p, and mean reciprocal rank MRR. Note that to align with the results in previous studies (Sun et al., 2020a;Pei et al., 2019b), p is set to 10 on DBP15k and 5 on WK3l60k. All metrics are preferred higher to indicate better performance.
Model Configurations. We use AMSGrad (Reddi et al., 2018) to optimize the training losses of the embedding learning process, for which we set the learning rate α to 0.001, the exponential decay rates β 1 and β 2 to 0.9 and 0.999, and batch sizes to 512 for both S K L and S T L . Trainable parameters are initialized using Xavier initialization (Glorot and Bengio, 2010). The dimension k is set to 300, which is often used for bilingual word embedding models trained on Wikipedia corpora (Conneau et al., 2018;Gouws et al., 2015), considering that the vocabulary sizes and training data density here are relatively close to those models. The number of GCN layers is set to 2. We set negative sample sizes of triples and text contexts to 5, the text context width to be 10 and the bias b in S K L to be 2. More implementation details are in Appendix §A.3 (Chen et al., 2021). Specifically, we evaluate variants of JEANS by adjusting two technical details. First, for the grounding process, aside from the simple surface form matching (marked with SFM), we also explore with the offthe-shelf Wikification-based EDL model (Upadhyay et al., 2018, marked with EDL). A grounding performance estimation is given in §4.4. In addition, we consider both CSLS and l 2 in inference.

Results
We report the entity alignment results in Table 1.
Considering the baseline results on DBP15k, we can see that the simplest variant of JEANS using SFM-based grounding has consistently outperformed all baselines on three cross-lingual settings. Particularly, it leads to 17.0-17.4% of absolute improvement in H @1 over the best structure-based baseline, 14.0-22.3% over the best entity profile based one, and 6.30-9.30% over the best semi-supervised one. This shows that while JEANS preserves the key merit of a semi-   supervised entity alignment method, and effectively enhances the alignment of KGs by exploiting incidental supervision signals from unaligned text corpora. Considering different grounding techniques, we observe that SFM variants often perform closely to EDL ones. This indicates that simple SFM is enough to combine KG and text corpora for JEANS's embedding learning without EDL-related resources. The results on Wk3l60k generally exhibit similar observations. In comparison to KDCoE that leverages strong but expensive supervision data of entity descriptions in cotraining, JEANS offers comparable performance based on very accessible resources. In general, the experiments here show that JEANS promisingly improves SOTA performance for entity alignment, with only the need for unparalleled free text and no need for additional labels.

Ablation Study
In Table 2 we report an ablation study for JEANS-SFM based on DBP15k, so as to understand the importance of each incorporated technique.
From the results, we observe that self-learning is the most important factor. The removal of it can lead to a drop of 10.1-13.8% in H @1, as well as drastic drop of other metrics. This also explains why semi-supervised baselines (group 3) typically perform better than others. However, even with self-learning, the removal of text can lead to H @1 drop of 2.4% on En-Fr and 4.2% on En-Ja. This shows that context information JEANS retrieves from free text effectively infers the match of entities. On the other hand, the structure encoding of KGs is more important than textual contexts, as it causes higher performance drops of 6.7-8.8% in H @1 by removing KGs. Note that the model without KG learns entity embeddings solely based on free text. Its results show that context information from text alone can provide a strong starting point from which incorporating KGs can further enhance its performance. Employing GCN leads to relatively slight performance gain, as joint learning the relation model and the language model can satisfyingly capture entity proximity. Changing the distance metric to l 2 also leads to 3.6-6.9% of decrease in H @1. This shows CSLS's ability to handle hubness and isolation is also important for similarity inference in the dense embedding space for the metric words and entities. Hence, this metric is also recommended by recent work (Sun et al., 2020a(Sun et al., , 2019b. In addition, if we introduce additional 5k seed lexicon (with only word alignment information, not including any entity alignment) provided by Conneau et al. (2018) for each language pair, it leads to additional improvement of 1.5-2.2% in H @1. This shows that JEANS effectively leverages available supervision data on lexemes to further enhance entity alignment, although it is not obligatory.

Grounding Performance Estimation
Due to the lack of ground truths on unlabeled text, it is hard to estimate the precision of entity grounding by the two types of (noisy) grounding techniques. However, as the requirement of the grounding process is to simply connect two data modalities for training the embeddings, we may encourage a technique that handles enough entity mentions and offer a higher coverage on entity vocabularies. Accordingly, the estimations of these two factors for the two techniques are reported in Table 3. As we can observe that, without considering disambiguation, SFM can overall cover higher proportions of the entity vocabularies, while pretrained EDL generally discovers more entity mentions for each entity. However, both techniques are sufficient to support the noisy grounding process and combine two data modalities for embedding learning and alignment induction.

Conclusion
This paper introduces JEANS for entity alignment. Different from previous methods that leverage only internal information of KGs, JEANS extends the learning on any text corpora that may contain the KG entities. For each language, a noisy grounding process first connects both data modalities, followed by an embedding learning process coupling GCN with relational modeling, and an self-learning based alignment process. Without introducing additional labeled data, JEANS offers significantly improved performance over SOTA models on benchmarks. Hence, it shows the effectiveness and feasibility of exploiting incidental supervision from free text for entity alignment.
For future work, aside from experimenting with other embedding learning techniques for KGs and text, we plan to extend JEANS to learn associations on KGs with different specificity (Hao et al., 2019). We also seek to extend the representation scheme in hyperbolic spaces (Nickel and Kiela, 2017;Chen and Quirk, 2019) along with the incorporation of hyperbolic lexical embedding techniques (Tifrea et al., 2018), aiming at better capturing the associations for hierarchical ontologies.