Relation-aware Ensemble Learning for Knowledge Graph Embedding

Knowledge graph (KG) embedding is a fundamental task in natural language processing, and various methods have been proposed to explore semantic patterns in distinctive ways. In this paper, we propose to learn an ensemble by leveraging existing methods in a relation-aware manner. However, exploring these semantics using relation-aware ensemble leads to a much larger search space than general ensemble methods. To address this issue, we propose a divide-search-combine algorithm RelEns-DSC that searches the relation-wise ensemble weights independently. This algorithm has the same computation cost as general ensemble methods but with much better performance. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed method in efficiently searching relation-aware ensemble weights and achieving state-of-the-art embedding performance. The code is public at https://github.com/LARS-research/RelEns.


Introduction
Knowledge graph (KG) embedding is a popular method for inferring latent features and making predictions in incomplete KGs (Ji et al., 2021).This technique involves transforming entities and relations into low-dimensional vectors and using a scoring function (Bordes et al., 2013;Wang et al., 2017) to assess the plausibility of a triplet (consisting of a head entity, a relation, and a tail entity).Well-known scoring functions, such as TransE (Bordes et al., 2013), ComplEx (Trouillon et al., 2017), ConvE (Dettmers et al., 2018), and CompGCN (Vashishth et al., 2020), have demonstrated remarkable success in learning from KGs.
Ensemble learning is a well-known technique that improves the performance of machine learning tasks by combining and reweighting the predictions of multiple models (Breiman, 1996; Wolpert,   1 L. Yue and Y. Zhang made equal contributions, Correspondence is to Q. Yao. 1992;Dietterich, 2000).Its effectiveness has also been verified in KG embedding by previous studies (Krompaß and Tresp, 2015;Wang et al., 2022b;Rivas-Barragan et al., 2022).
While designing different scoring functions to model various relation properties (Ji et al., 2021;Sun et al., 2019;Li et al., 2022), such as symmetry, inversion, composition and hierarchy, is crucial for achieving good performance, existing ensemble methods do not reflect the relation-wise characteristics of different models.This motivates us to propose specific ensemble weights for different relations, named as RelEns problem, in this paper.By doing so, different KG embedding models can specialize in different relations, leading to improved performance.However, the number of parameters to be searched will linearly increase, which can significantly complicate the ensemble construction process especially for KGs with many relations.To alleviate the difficulty of searching for relation-wise ensemble weights, we propose DSC, an algorithm that Divide the overall ensemble objective into multiple sub-problems, Search for the ensemble weights for each relation independently, and then Combine the results.This approach significantly reduces the size of the search space and evaluation cost for individual sub-problems, compared to the overall objective.
In summary, we propose RelEns-DSC, a novel relation-aware ensemble learning method for KG embedding that searches different ensemble weights independently for different relations, using a divide-concur strategy.Empirically, RelEns-DSC significantly improves the performance on three benchmark datasets (WN18RR, FB15k-237, NELL-995) and achieves the first place on the largescale leaderboards ogbl-biokg and ogbl-wikikg2.Our approach is more effective than general ensemble techniques, and it is more efficient with the divide-concur strategy under parallel computing.

Proposed Method
Denote a KG as G = (V, R, D), where V contains V entities (nodes), R contains R types of relations between entities, and D = {(h, r, t) : h, t ∈ V, r ∈ R} contains the triplets (edges).D is split into three disjoint sets D tra , D val , D tst for training, validation and testing, respectively.
The learning objective of a KG embedding model is to rank positive triplets higher than negative triplets, in order to accurately identify the potential positive triplets missed in the current graph (Wang et al., 2017;Ji et al., 2021).
Specifically, formulated as a tail prediction problem 2 , the KG embedding model aims to rank the tail entity t of a given triplet x = (h, r, t), which belongs to either D val or D tst , higher than a set of negative entities.The set of negative entities is defined as ∈ D}.The model F (x) computes a score vector s for each entity e ∈ {t} ∪ N t , which indicates the degree of plausibility that the triplet (h, r, e) is true.
A ranking function Γ s is used to convert the scores s into a ranking list p = (p 1 , . . ., p C ) for the C = 1+|N t | entities.A smaller rank value implies the higher prediction priority.Following (Bordes et al., 2013;Trouillon et al., 2017;Sun et al., 2019;Vashishth et al., 2020), we adopt mean reciprocal ranking (MRR) as the evaluation metric.Larger MRR indicates better performance.

Relation-wise Ensemble Problem
We observe that embedding models may exhibit varying strengths in modeling different types of relations (see Appendix A.2 for details).To account for this, we propose a novel approach that learns distinct weights for each relation, based on the performance of models on validation set D val .Specifically, given N trained KG embedding models, i.e., F 1 , F 2 , . . ., F N , and a set of relations R. We introduce a weight α r i ≥ 0 assigned to model F i for relation r and M (p, x) = 1/p t for the reciprocal ranking of a given data point x = (h, r, t).Let D r val denote as the subset of validation triplets whose relations are r.The objective of relation-2 Head prediction is conducted in the same way with negative entities N h = {e ∈ V : (e, r, t) / ∈ D}.For simplicity, we only use tail prediction as an example to introduce our method.
wise ensemble can be written as follows: For each triplet x r j with relation r, we apply the ensemble weights α r i to the ranking list Γ(F i (x r j )) generated by the i-th model.The scales of scores vary significantly.Optimizing scores directly may be more challenging.Additionally, since ranks have similar scales, the searched weights can better indicate the importance of the corresponding base model.Specifically, we obtain the ensembled score , where "−" turns the ranks to scores, indicating higher prediction priority with a higher value in p r j .In particular, if the ensemble weights assigned for each model F i for all relations are identical, i.e., α . ., N , the objective in equation ( 1) (denoted as RelEns-Basic) reduces to the general ensemble method (denoted as SimpleEns).By optimizing the values of α r i , the goal is to achieve higher MRR performance on the validation set D val = r∈R D r val .

Divide Search and Combine
Comparing with SimpleEns, RelEns-Basic requires searching for N R parameters.As MRR is a nondifferential metric, zero-order optimization techniques, like random search and Bayesian optimization (Bergstra et al., 2011), are often used to solve Eq. ( 1).However, these algorithms usually involve sampling candidates in the search space, the complexity of which can grow exponentially with the search dimension due to the curse of dimensionality (Köppen, 2000).As a result, optimizing Eq.
(1) can be challenging.To address this issue, we propose Proposition 1, which enables the separation of the big problem Eq. ( 1) into R independent sub-problems.In the divided problem r, there are only N parameters {α r i } i=1,...,N to be searched.Proposition 1 (separable optimization problem).The optimal values of {α r i } r=1,...,R i=1,...,N that are searched on D val in (1) can be equated to the values of {α r i } i=1,...,N that are independently optimized on D r val for each r ∈ R via the following problem The complete divide-search-and-combine procedures are outlined in Algorithm 1.By separably searching the divided problems, we can determine the optimal values of {α r i } i=1,...,N for each r on the validation data D r val .Finally, we combine the searched values of {α r i } r=1,...,R i=1,...,N to compute the scores p r j = − N i=1 α r i Γ F i (x r j ) for x r j ∈ D tst in order to evaluate the performance.

Complexity Analysis
Assuming that the evaluation cost of Γ(•) and M (Γ(•), x) on a single data sample x is a constant, the time complexity of ensemble learning is determined by two factors: (i) the number of data samples to be evaluated; (ii) the number of ensemble parameters to be sampled.For SimpleEns, the complexity is O(|D val |e N ).On the other hand, RelEns-Basic in Eq. (1) requires O(|D val |e RN ) since the sampling complexity increases exponentially with the search dimension.In comparison, the complexity of RelEns-DSC in Algorithm 1 is O(|D val |e N ), which is on par with SimpleEns.

Experiments
The experiments were implemented using Python and run on a 24GB NVIDIA GTX3090 GPU.
As the ranking function Γ(•) and MRR are nondifferentiable, We chose the widely used Bayesian optimization technique, Tree-Parzen Estimator (TPE) (Bergstra et al., 2015), to solve the maximization problems in Eq. ( 1) and Eq. ( 2), the details of which are provided in the Appendix B.3.
Hyperparameters.To compare on the general benchmarks, we use the fine-tuned hyperparameters reported by KGTuner (Zhang et al., 2022b).For top three methods on OGB leaderboard, we use their reported hyperparameters.Details of these settings are in Appendix B.2.

Performance Comparison
Table 1 and Table 2 present the testing performance comparison.SimpleEns is the variant introduced in Section 2.1.We observe that SimpleEns consistently outperforms the base models by weighting different models according to their learning ability.The proposed method RelEns-DSC surpasses SimpleEns by a large margin, verifying the effectiveness of considering relation-specific ensemble weights for KG embedding.
The top models on ogbl-biokg are more diverse than ogbl-wikikg2.On ogbl-biokg, AutoBLM and ComplEx are bilinear models, while TripleRE is a translational model.The training framework of Model MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10  the three models are also different.In comparison, the top three methods on ogbl-wikikg2 are all translational models with similar approaches of sharing entity embeddings.As a result, the variations of relation-wise performance of the top three models on ogbl-biokg are larger than ogbl-wikikg2 (with std 0.0452 vs. 0.0261).This can explain why the relation-wise ensemble is more significant on ogbl-biokg than ogbl-wikikg2.Furthermore, we illustrate the ensemble weights of SimpleEns and RelEns-DSC on the WN18RR dataset in Figure 1, which shows that RelEns-DSC learns relation-specific ensemble weights, which contributes to its superior performance.

Efficiency Comparison
We compare the learning curves (highest MRR yet searched vs. running time) of SimpleEns, RelEns-Basic, and RelEns-DSC on NELL-995 in Figure 2 (the curves of other datasets are in Appendix C.1).The ensemble weights for all three methods are initialized as 1 /N.We denote the number of parameter  searches the weights on a few relations, while others are unchanged.Over time, the performance of RelEns-DSC has improved significantly as more and more relations have found their optimal values.Increasing Q from 50 to 100 did not lead to any improvement in the performance of SimpleEns and RelEns-Basic.However, RelEns-DSC was able to achieve better overall performance since increasing Q allows the sub-problems to be more sufficiently solved with more iterations.In addition, RelEns-DSC can be benefited by parallel computing on the relation level, further improving efficiency.
Model MRR H@10 MRR H@10 MRR H@10 Mean .Table 3 shows the performance comparison of multiple variants of RelEns-DSC on the three benchmark datasets.Due to space limit, results of Hit@{1,3} and the implementation details of the variants are provided in Appendix B.2.The stacking method (Stacking), arithmetic mean method (Mean) and MRR-based weighted mean method (MRR-Mean) have poorer performance compared to RelEns-DSC.This indicates the importance of searching for ensemble weights with TPE tech-nique.Stacking performs the worst since the nondifferentiable metric MRR cannot directly optimized.In particular, considering relation-specific ensemble weights, RelEns-DSC can lead to better performance than the general ensemble methods.

Conclusion
This paper introduces a novel ensemble method, Relation-aware Ensemble with Divide-Search-Combine (RelEns-DSC) for KG embedding.The proposed RelEns-DSC learns relation-specific ensemble weights for different models and efficiently searches the weights using the divide-concur strategy.Empirical results demonstrate that our proposed method outperforms existing ensemble methods for KG embedding, in both effectiveness and efficiency.
Limitations.The proposed method mainly addresses the ensemble problem for entity prediction tasks in knowledge graph completion.However, it does not effectively address the other graph learning tasks, such as entity/node classification, relation prediction, and graph classification.In addition, the significance of RelEns-DSC is under the case of multi-relational graphs like knowledge graph and heterogeneous graph, thus is not well adapted to homogeneous graph with single edge type.

A.1 Relation-wise Ensemble
An overview of the relation-wise ensemble problem is provided in Figure 3. First, the dataset D is split into multiple sub-sets D 1 , D 2 , . . ., D R according to the relations.For each sample x r j from D r , the models F 1 , F 2 , . . ., F N output the scores and the ranking function Γ(•) provides ranking lists for the C entities according to their scores.The relation-wise ensemble weights α r 1 , α r 2 , . . ., α r N re-weight the rank lists as the new scores p r j of x r j and re-rank the new scores to evaluate the perfomance.
• membe_meronym: translational models such as TransE and RotatE exhibit the highest performance.
• synset_domain_topic_of: bilinear models like ComplEx achieve the best results.
• has_part: while traditional scoring functions perform well on this relation, neural network-based models such as ConvE and CompGCN exhibit suboptimal performance.
• verb_group: in contrast to has_part, neural network models such as ConvE and CompGCN perform better, whereas traditional scoring functions show inferior performance.
These results demonstrate that KG embedding models may specialize in different relations, leading to significant variation in their performance across relations.B Supplementary Materials for the Experimental Settings

B.1 Statistics of Datasets
We use the following datasets for evaluation: (i) WN18RR is a link prediction dataset which is a subset of WordNet (Dettmers et al., 2018); (ii) FB15k-237 contains triplets of knowledge base relationships and textual mentions of Freebase entity pairs (Toutanova and Chen, 2015); (iii) NELL-995 is a dataset built from the web via an intelligent agent called Never-Ending Language Learner that reads the web over time (Xiong et al., 2017); (iv) ogbl-biokg is a KG, which was created using data from a large number of biomedical data repositories (Hu et al., 2020); and (v) ogbl-wikikg2 is a KG extracted from the Wikidata knowledge base (Hu et al., 2020).Statistics of these datasets are provided in Table 5.

B.2 Hyperparameter Setting
We list the hyperparameters for base models in KGTuner (Zhang et al., 2022b) 4 on the WN18RR, FB15k-237 and NELL-995 datasets in Table 6 and Table 7.For CompGCN (Vashishth et al., 2020) 5 , we use 200-dimensional embeddings for node and relation embeddings and apply the standard binary cross entropy loss with label smoothing.The number of GCN layers is 2, and the score function used in CompGCN is ConvE, the learning rate is set to 0.001, the batch size is 128, and the dropout rate is 0.1.
For HousE (Li et al., 2022) 6 , we used the default hyperparameters specified in the original paper.Both node and relation embeddings were set to 800 dimensions.The learning rate was set to 0.0005, and the batch size was 1000.
For the top three methods on the OGB leaderboard7 , since their code has been officially made public by OGB, we used their code directly with their corresponding hyperparameters.

Figure 4 :
Figure 4: MRR of selected base models for specific relations on the WN18RR dataset.

Table 5 :
Statistics of the datasets.