Joint Multilingual Knowledge Graph Completion and Alignment

Knowledge graph (KG) alignment and completion are usually treated as two independent tasks. While recent work has leveraged entity and relation alignments from multiple KGs, such as alignments between multilingual KGs with common entities and relations, a deeper understanding of the ways in which multilingual KG completion (MKGC) can aid the creation of multilingual KG alignments (MKGA) is still limited. Motivated by the observation that structural inconsistencies -- the main challenge for MKGA models -- can be mitigated through KG completion methods, we propose a novel model for jointly completing and aligning knowledge graphs. The proposed model combines two components that jointly accomplish KG completion and alignment. These two components employ relation-aware graph neural networks that we propose to encode multi-hop neighborhood structures into entity and relation representations. Moreover, we also propose (i) a structural inconsistency reduction mechanism to incorporate information from the completion into the alignment component, and (ii) an alignment seed enlargement and triple transferring mechanism to enlarge alignment seeds and transfer triples during KGs alignment. Extensive experiments on a public multilingual benchmark show that our proposed model outperforms existing competitive baselines, obtaining new state-of-the-art results on both MKGC and MKGA tasks. We publicly release the implementation of our model at https://github.com/vinhsuhi/JMAC

Most popular KGs such as YAGO (Suchanek et al., 2007), BabelNet (Navigli and Ponzetto, 2010), and DBpedia (Lehmann et al., 2015), are multilingual, that is, they contain sets of triples constructed from sources in different languages.Fortunately, these KGs often complement each other since a KG in one language might be more comprehensive in some domains compared to a KG in a different language, and vice versa, while still sharing a large number of entities and relation types (Sun et al., 2020a).Especially KGs for low-resource languages could benefit from triples contained in KGs for high-resource languages.Although numerous approaches to KGC have been proposed in recent years (Bordes et al., 2013;Dettmers et al., 2018;Vashishth et al., 2020), most of these only operate on one KG at a time.Treating KGs independently, however, might lead to poor performance due to the sparseness of low-resource languages.Motivated by this observation, some methods have tried to improve multilingual KG completion (MKGC) using multilingual KG alignment (MKGA) (Chen et al., 2020;Singh et al., 2021;Huang et al., 2022).
The alignment problem is challenging due to the varying levels of completeness of monolingual KGs (Sun et al., 2020c).The resulting structural inconsistency between KGs leads to the problem of corresponding entities in two KGs having vastly different embeddings (Xu et al., 2018).Figure 1 illustrates that, in principle, MKGC and MKGA should be mutually beneficial.Given the alignment seed set {(E, E * ), (D, D * ), (B, B * )}, the task here is to find corresponding entities for A and C which are A * and C * , respectively.The two KGs share a similar structure but the triple (B * , r4, A * ) in KG * has no corresponding triple in KG.This causes the embeddings of entities A and A * to differ and, thus, makes it difficult to identify the alignment between A and A * .Indeed, A is more likely to be aligned to C * as they share a comparable local structure (similar degree and 1-hop neighbor set).Thus, completing missing triples is crucial to improve the alignment quality.Indeed, if one aligned A to A * , one could recover (B, r4, A) by transferring (B * , r4, A * ) from KG * .
Motivated by these observations, we propose JMAC, a method for Joint Multilingual KG Completion and Alignment, consisting of two interdependent Completion and Alignment components.Both components employ relation-aware graph neural networks (GNNs) to encode multihop neighborhood information into entity and relation embeddings.The Completion component is trained to reconstruct missing triples using the TransE translation-based loss (Bordes et al., 2013) and an additional loss term that incorporates information about the already known alignments.While we learn separate embeddings for the Alignment and Completion components, the embeddings of the Completion component are used within the Alignment component, mitigating the aforementioned problem of structural inconsistencies.In addition, we propose a mechanism for estimating the alignment entropy which is used to adaptively and iteratively grow the alignment seed set.Finally, we also propose a method for transferring triples based on the currently derived alignments.
Our contributions are as follows: • We propose JMAC, a two-component architecture consisting of Completion and Align-ment components for joint multilingual KG completion and alignment.
• We propose a relation-aware GNN for KG embeddings, which learns representations for alignment and completion tasks.
• We introduce a structural inconsistency reduction mechanism that fuses embeddings from the Completion component with those of the Alignment component.
• We propose an alignment seed enlargement and triple transferring mechanism.
• We conduct extensive experiments using the public multilingual benchmark DBP-5L (Chen et al., 2020) and show that our model outperforms existing competitive baselines and achieves state-of-the-art results on both MKGC and MKGA tasks.

Problem Definition and Related Work
Let G = (E, R, T ) denote a KG, where E, R and T denote the sets of entities, relations, and triples, respectively.A triple (e h , r, e t ) ∈ T is an atomic unit, which represents some relation r ∈ R between a head entity e h ∈ E and a tail entity e t ∈ E.

Multilingual KG completion (MKGC)
Given a KG G = (E, R, T ), the KG completion (KGC) task aims to predict missing triples (e h , r, e t ), that is, to predict the missing tail entity e t ∈ E of an incomplete triple (e h , r, ?) or the missing head entity e h ∈ E of an incomplete triple (?, r, e t ), where ?denotes the missing element.Embedding models for KG completion have been proven to give state-of-the-art results, representing entities and relation types with latent feature vectors, matrices, and/or third-order tensors (Nguyen, 2020;Ji et al., 2021).These models define a score function f and are trained to make the score f (e h , r, e t ) of a correct triple (e h , r, e t ) larger than the score f (e h ′ , r ′ , e t ′ ) of an incorrect or not known to be correct triple (e h ′ , r ′ , e t ′ ).The earliest instances of these embedding models use shallow neural networks with translation-based score functions (Bordes et al., 2013;Wang et al., 2014;Lin et al., 2015).Recently, KGC approaches using deep embedding models and more complex scoring functions have been proposed, such as CNN-based models (Dettmers et al., 2018;Nguyen et al., 2018), RNN-based models (Liu et al., 2017;Guo et al., 2018), andGNN-based models (Schlichtkrull et al., 2018;Shang et al., 2019;Vashishth et al., 2020;Nguyen et al., 2022).
The MKGC task is to perform the KGC task on a KG given the availability of other KGs in different languages (Chen et al., 2020;Huang et al., 2022).

Multilingual KG alignment (MKGA)
MKGA, which is also known as cross-lingual entity alignment, aims to match entities with their counterparts from KGs in different languages (Chen et al., 2017;Wang et al., 2018;Wu et al., 2019;Sun et al., 2020b).Without loss of generalization, we define the alignment between a source graph G = (E, R, T ) and a target graph G * = (E * , R * , T * ).
For each entity e ∈ E, the MKGA task is now to find e * ∈ E * (if any).
Existing models compute an alignment matrix whose elements represent the similarity score be-tween any two entities e ∈ E and e * ∈ E * across two KGs.The models then employ a greedy matching algorithm (Kollias et al., 2011) to infer matching entities from the alignment matrix.The models typically require an alignment seed set L of prealigned entity pairs (e, e * ).

Joint MKGC and MKGA
The joint MKGC and MKGA problem aims to infer both, new triples for each KG and new aligned entity pairs for each pair of KGs.Performing two tasks jointly might be beneficial: missing triples (e h , r, e t ) in one KG could be recovered by crosschecking another KG via the alignment, which, in turn, could be boosted by the newly added triples.
Despite the obvious benefit to complete and align KGs jointly, there has not been much work addressing the problem.A notable exception is the application of multi-task learning to the problem (Singh et al., 2021).The proposed multi-task model, however, is not able to capture local neighborhood information.Another limitation is the missing robustness to the previously described issue of structural inconsistencies between the KGs during training.

JMAC: Joint Multilingual Alignment and Completion
Figure 2 illustrates the architecture of JMAC which consists of the Completion and Alignment components, respectively.Each component uses a relation-aware graph neural network (GNN) to encode multi-hop neighborhood information into entity and relation embeddings.The use of two GNN encoders is beneficial as embeddings that are suitable for the alignment might differ from those that are most beneficial for the completion problem.

Relation-aware graph neural network
To better capture relation information, we propose a GNN architecture that uses relation-aware messages and relation-aware attention scores.We unify heterogeneous information from KGs using a GNN with K layers.For the k-th layer (denoted by the superscript k ), we update the representation a k+1 e of each entity e ∈ E as: where a k e ∈ R n is the vector representation of entity e at the k-th layer; N (e) = {(e ′ , r)|(e, r, e ′ ) ∈ T ∪ (e ′ , r, e) ∈ T } is the neighbor set of entity e; and the vector m k e ′ ,r ∈ R n denotes the message passed from neighbor entity e ′ to entity e through relation r.
Here, α k e,e ′ ,r represents the attention weight that regulates the importance of the message m k e ′ ,r for entity e; and g(.) is a linear transformation followed by a Tanh function.
The innovations of our GNN-based model, which we describe in more detail in the following two sections, are (i) we make the message-passing NN to be relation-aware by learning the relation embedding a k r ∈ R n for each relation r ∈ R and by integrating it into the entity message passing scheme m k e ′ ,r and (ii) we introduce an attention weight α k e,e ′ ,r to further enhance the relation-aware capability of our GNN-based embeddings.
Relation-aware message Unlike existing GNNbased approaches that infer relation embeddings from learned entity embeddings (Sun et al., 2020b), our approach allows entity and relation embeddings to be learned jointly and, thus, to both contribute to the message passing neural network.This is achieved through an entity-relation composition operation.The message m k e ′ ,r in Equation 1 is defined as: where MLP k comp : R n → R n is a two-layer MLP with the LeakyReLU activation function.
Here, the relation embedding is updated by where MLP k rel : R n → R n maps relations to a new embedding space and allows them to be utilized in the next layer.
Relation-aware attention We define the weight α k e,e ′ ,r in Equation 1 to be relation-aware as:

Completion component
The Completion component works similarly to KG embedding models (Nguyen, 2020;Ji et al., 2021) which compute a score f (e h , r, e t ) for each triple (e h , r, e t ).Our score function f is based on TransE (Bordes et al., 2013) and is computed across all hidden layers of the relation-aware GNN encoder in the Completion component as follows: We use a margin-based pairwise ranking loss (Bordes et al., 2013) across all hidden layers: where [x] + = max(0, x); γ c > 0 is the margin hyper-parameter; and T is the set of incorrect triples constructed by corrupting either the head or the tail entity the correct triple (e h , r, e t ) ∈ T .
Also, given the availability of the alignment seed set L of pre-aligned entity pairs (e, e * ) among KGs as mentioned in Section 2.2, we additionally compute the following alignment constraint loss: where d cos denotes the cosine distance.
To incorporate alignment information into the Completion component, our MKGC loss is computed as the sum of the two losses L c_1 and L c_2 :

Alignment component
The Alignment component is to perform the MKGA task as defined in Section 2.2.We propose to incorporate a structural inconsistency reduction inherited from the Completion into the Alignment component.We also introduce a mechanism to enlarge the alignment seed set L.

Structural inconsistency reduction (SIR)
The different completeness levels of KGs lead to structural inconsistencies that might cause incorrect alignment predictions.Fortunately, entity and relation embeddings in the Completion component (i.e.c k e and c k r ) could help reconstruct missing triples, reducing the structural inconsistencies between KGs.To this end, we propose to incorporate the Completion component embeddings at the k-th layer of the relation-aware GNN in the Alignment component as follows: where MLP k a_1 : R 2×n → R n ; and MLP k a_2 : R 2×n → R n .The transformed embeddings then will be used as input for the next layer following equations 1 and 3.This allows the structural inconsistency reduction to take place at every GNN layer of the two components, enabling their deep integration.

Final entity and relation embeddings for MKGA
We compute the final embeddings for entities and relations for this Alignment component as follows: where MLP a_3 : R (K+1)×n → R n ; and MLP a_4 : R (K+1)×n → R n .
Alignment Since the entropy measures the uncertainty of the alignment predictions -lower entropy corresponds to a smaller alignment uncertainty -EnTr implements an uncertainty-mediated estimation of the size q of L as follows: where A is the alignment matrix before training and β ∈ [0, 1] is a hyper-parameter controlling the number of new entity pairs to generate.EnTr then chooses the q entity pairs with the q highest cosine similarity scores from A and sets L to contain these entity pairs.EnTr also performs transfer of triples between the KGs that logically follow from the existing alignments.In particular, for each two aligned entity pairs (e, e * ) and (e ′ , e ′ * ), if r is a relation connecting e and e ′ , EnTr connects e * and e ′ * by r as well.This allows the two KGs' structures to become gradually more similar over time.
Optimization The learning objective is to minimize the distance between correctly aligned entity pairs while maximizing the distance between negative entity pairs using a margin-based pairwise ranking loss: (17) where γ a > 0 is the margin hyper-parameter; and L is the set of negative entity pairs, which is constructed by replacing one entity of each correctly aligned pair by its nearest entities (Wu et al., 2019).

Model training
We optimize the losses L c and L a iteratively, using two optimizers respectively for the Completion and Alignment losses.In particular, we hold the Alignment component's parameters fixed and optimize only the loss L c .Then we hold the Completion component's parameters fixed and only optimize the loss L a .We keep iterating this process in each training epoch.

Dataset
Following previous work (Singh et al., 2021;Huang et al., 2022), we conduct experiments using the benchmark DBP-5L1 (Chen et al., 2020), publicly available for both MKGC and MKGA tasks.2DBP-5L consists of 1,392 relations, 56,590 entities, and 225,831 triples across the five languages Greek (EL), Japanese (JA), French (FR), Spanish (ES), and English (EN).Table 1 presents statistics for each DBP-5L language.Here, each language is referred to as a KG.The DBP-5L benchmark is created for MKGC evaluation.However, each language pair also has pre-aligned entity pairs as the alignment seeds.In particular, on average, about 40% of the entities in each KG have their counterparts at other KGs.Thus, it can also be used for MKGA evaluation (Singh et al., 2021).Similar to prior works, we use the same split of training, validation, and test data, for MKGC available in DBP-5L for each KG.For MKGA, we use the same 50-50 split of the alignment seeds for training and test, as used in AlignKGC (Singh et al., 2021).3

Evaluation protocol
For MKGC, and following previous work (Chen et al., 2020;Singh et al., 2021;Huang et al., 2022), each correct test triple (e h , r, e t ) is corrupted by replacing the tail entity e t with each of the other entities in turn, and then the correct test triple and corrupted ones are ranked in descending order of their score.Similar to the previous work, before ranking, we also applied the "Filtered" setting protocol (Bordes et al., 2013).We employ standard evaluation metrics, including the mean reciprocal rank (MRR), Hits@1 (i.e. the proportion of correct test triples that are ranked first) and Hits@10 (i.e. the proportion of correct test triples that are ranked in the top 10 predictions).Here, a higher score reflects better prediction result.
For MKGA, and following previous work (Chen et al., 2017;Wu et al., 2019;Sun et al., 2020b;Singh et al., 2021), each correct test pair (e, e * ), where e ∈ E and e * ∈ E * , is corrupted by replacing entity e * with each of the other entities from E * in turn, and then the correct test pair and corrupted ones are ranked in descending order of their similarity score.We also employ the evaluation metrics MRR, Hits@1 and Hits@10.

Method
Align.

Implementation details
The availabilty of surface information (SI) such as entity names makes the alignment problem less challenging (Xiang et al., 2021).When surface information is not used by the methods (denoted as w/o SI), the entity and relation embeddings are randomly initialized.When surface information is used (denoted as w/ SI), initial entity and relation embeddings are obtained from pre-trained text embedding models (Singh et al., 2021;Huang et al., 2022).Therefore, we also evaluate these two problems and refer to them as JMAC w/o SI and JMAC w/ SI.We implement our model using Pytorch (Paszke et al., 2019).We iteratively train our JMAC components up to 30 epochs with two Adam optimizers (Kingma and Ba, 2014).We use a grid search to choose the number of GNN hidden layers K ∈ {1, 2, 3}, the initial Adam learning rates λ ∈ 1e −4 , 5e −4 , 1e −3 , the controllable hyper-parameter β ∈ {0.1, 0.2, 0.3} from Equation 16, the margin hyper-parameters γ c and γ a ∈ {0, 5, 10}, and the input dimension and MLP hidden sizes n ∈ {128, 256, 512}.The test set results for the two tasks are reported for the model checkpoint which obtains the highest MRR on the validation set of the MKGC task.

Comparison results Table 2 lists MKGC results
for JMAC and other strong baselines on the DBP-5L test sets.Overall, the multilingual models perform better than the monolingual ones.It is not surprising that JMAC and AlignKGC without surface information (w/o SI) obtain lower numbers than their counterparts using SI (w/ SI).For example, with SI information, JMAC achieves an average improvement of about 8% points for Hits@10.Note that, our "JMAC w/o SI" still produces higher Hits@10 results than "AlignKGC w/ SI" on all KGs, and in general, performs better than "AlignKGC w/o SI".Compared to SS-AGA that uses both SI and all available alignment seeds for training, our "JMAC w/o SI" Hits@10 results are about 30% better for Greek, 20% better for Japanese, French and English, and 10% better for Spanish.We find that our "JMAC w/ SI" obtains the highest results across all KGs on all evaluation metrics, producing new state-of-the-art performances (except the second-highest Hits@1 scores on Greek, Spanish and English where "AlignKGC w/ SI" produces the highest Hits@1 scores).
Ablation study Table 3 presents ablation results.Removing or replacing any component or mechanism decreases the model's performance.The largest decrease is observed without the use of the Alignment component (w/o Align.)where the MRR scores drop about 15% points on average.The model also performs substantially poorer when it is not using the alignment seed enlargement and triple transferring mechanism (w/o EnTr).Here, the model's Hits@10 scores decrease about 10% in Greek, Japanese, French and Spanish.The performance drops incurred by using the same relationaware GNN encoder (w/ 1-GNN) or not using

Greek
Japanese French Spanish English H@1 H@10 MRR H@1 H@10 MRR H@1 H@10 MRR H@1 H@10 MRR H@1 H@10 MRR JMAC w/ SI 55.Table 3: Ablation study for the MKGC task.(i) w/o RA-GNN: Without using the relation-aware message (Equation 2) and relation-aware attention (Equation 4), here we replace our proposed RA-GNNs by the graph isomorphism networks (Xu et al., 2019).relation-aware messages and attention (w/o RA-GNN) also demonstrate the importance of the twocomponent architecture design and the relationware message passing scheme.Although the structural inconsistency reduction mechanism aims to improve the alignment performance, we find that without it (w/o SIR), the MKGC performance also declines.This shows that the improvement in the MKGA task can lead to direct improvements in the MKGC task.

Impact of alignment seeds
To have a better insight into how much MKGA can aid MKGC, we evaluate the MKGC task when using alignment seeds with a sampling percentage ranging from 0% to 100% (i.e. using all pre-aligned entity pairs).
Figure 3 shows the MRR scores obtained for this experiment.Overall, our JMAC is improved when using more alignment seeds.The surface information (SI), e.g.entity names, provides informative clues about the similarity between entities across KGs (e.g. two entities with similar names are more likely to be an alignment pair).It is therefore not surprising that "JMAC w/ SI" performs better than "JMAC w/o SI" in all scenarios, especially when the used sampling percentage of alignment seeds is small (i.e.SI becomes more valuable).In addition, the completion performance gradually closer as the sampling percentage approaches 100%.The reason is possibly that when the set of used alignment seeds is large enough, there is not much for surface information to contribute to the alignment performance, which aids the completion performance.

MKGA results
Comparison results Table 4 presents the overall results of different models on the MKGA task.We refer the reader to the Appendix for results on each language pair.Overall, JMAC performs the best in both the "w/ SI" and "w/o SI" categories.
Although SS-AGA performs well in the MKGC task, it performs much worse on the alignment task.Specifically, although using SI, it produces the lowest Hits@10.AlignKGC, on the other hand, achieves the third-best results in both categories.However, it is still outperformed by pure KGA models such as AliNet (obtaining 11.1% points higher Hits@1 than AlignKGC in the "w/o SI" cat- egory) and RDGCN (obtaining 4.5% points higher Hits@1 than AlignKGC in the "w/ SI" category).
Ablation study Table 5 shows the ablation results on the MKGA task.The model performance drops by about 5.3% points in Hits@10 without the relation-aware messages and attention (w/o RA-GNN), confirming that the relation-aware mechanism is a crucial part of our model.Using only one GNN encoder (w/ 1-GNN) for both Completion and Alignment components performs worse.This indicates that combining the objective functions and using the same feature for multiple tasks might not be optimal.The EnTr mechanism improves the model Hits@10 by about 2% points (w/o EnTr 95.6% → 97.5%).In the absence of SIR (w/o SIR), the alignment performance suffers a drop in each evaluation metric (2.1% points for Hits@1, 3.2% points for Hits@10, and 2.5% points for MRR).

Figure 1 :
Figure 1: The incompleteness of KG (missing triple (B, r4, A)) might lead to a wrong alignment prediction (i.e. both A and C are predicted to be aligned to C * ).If A and A * were aligned, however, the missing triple (B, r4, A) in KG could be found by transferring triple (B * , r4, A * ) from KG * .
att : R 2×n → R; and • denotes the vector concatenation operator.As the message vector m k e ′ ,r contains the information of not only neighbor entity e ′ but also neighbor relation r, our attentive score α k e,e ′ ,r can capture the importance of the message coming from entity e ′ to entity e conditioned on relation r connecting them.Notation extension Recall that we use two different relation-aware GNN encoders as illustrated in Figure 2: one for the Completion and one for the Alignment component.To distinguish these two encoders, we use c and a to denote the embedding representations used for the Completion and Alignment components, respectively.In particular, a k e and a k r are now the corresponding embeddings of entity e and relation r at the k-th layer of the relation-aware GNN in the Alignment component.Furthermore, c k e and c k r are the corresponding embeddings of entity e and relation r at the k-th layer of the relation-aware GNN in the Completion component (computed as those of the Alignment component defined in equations 1 and 3).
(ii) w/ 1-GNN: Both the Completion and Alignment components share the same relation-aware GNN encoder, also leading to "w/o SIR".(iii) w/o SIR: Without using the structural inconsistency reduction mechanism described in Section 3.3, i.e. equations 10 and 11 are not used.(iv) w/o EnTr: Without using the alignment seed enlargement and triple transferring mechanism described in Section 3.3.(v) w/o Align.: Model variant containing only the Completion component without the Alignment one.
seed enlargement and Triple transferring (EnTr) As mentioned in Section 2.2, existing alignment models require an alignment seed set L of pre-aligned entity pairs for training.An intuitive approach to improve alignment results is to iteratively increase the size of L during training.The number of alignments by which the size of L is increased should depend on some notion of certainty the Alignment component has in its predictions.For example, in the early stages of the alignment process, when the KGs are still sparsely connected, L should be smaller.The more confident the Alignment component is in its predictions, the larger should L be.Hence, we propose EnTr, a method to enlarge the alignment seed set.EnTr estimates an optimal number of new entity pairs to be added to L according to a measure of alignment certainty.At each training epoch, EnTr first computes an alignment matrix A, where each element is the cosine similarity between any two entity embeddings of the two KGs, that is, A(e, e * ) = 1 − d cos (a e , a e * ).EnTr then computes the Shannon entropy of the softmax distribution over this alignment matrix: P(e * |e) = exp (A(e, e * )) e ⋆ ∈E * exp (A(e, e ⋆ )) (14) H(A) = − e∈E e * ∈E * P(e * |e) log P(e * |e) (15)

Table 2 :
MKGC results.All metrics are reported in %.

Table 4 :
(Sun et al., 2020b)s for AlignKGC are taken fromSingh et al. (2021).We report our results for other baselines including MTransE(Chen et al., 2017), AliNet(Sun et al., 2020b), SS-AGA, PSR(Mao et al.,  2021)andRDGCN (Wu et al., 2019), employing their publicly released implementations.See the Appendix for the training protocols of these baselines.SS-AGA is originally proposed and evaluated for MKGC only.However, it also computes and defines an alignment matrix, thus we can also evaluate SS-AGA for the MKGA task using this matrix.

Table 5 :
Ablation study for the MKGA task.(v) w/o Comple.: Model variant containing only the Alignment component without the Completion one.

Table 8 :
MKGA MRR results.Singh et al. (2021)do not report the MRR results for AlignKGC.

Table 10 :
Ablation Hits@10 results for the MKGA task.Variants EL-EN EL-ES EL-FR EL-JA EN-FR ES-EN ES-FR JA-EN JA-ES JA-FR Overall JMAC w/ SI

Table 11 :
Ablation MRR results for the MKGA task.

Table 13 :
Sun et al. (2020c)sults.We report our results for AliNet and PSR using their publicly released implementations.Results for MTransE and RDGCN are taken fromSun et al. (2020c).Here, RDGCN is the best performing model among 12 different models experimented bySun et al. (2020c).