LambdaKG: A Library for Pre-trained Language Model-Based Knowledge Graph Embeddings

Knowledge Graphs (KGs) often have two characteristics: heterogeneous graph structure and text-rich entity/relation information. Text-based KG embeddings can represent entities by encoding descriptions with pre-trained language models, but no open-sourced library is specifically designed for KGs with PLMs at present. In this paper, we present LambdaKG, a library for KGE that equips with many pre-trained language models (e.g., BERT, BART, T5, GPT-3), and supports various tasks (e.g., knowledge graph completion, question answering, recommendation, and knowledge probing). LambdaKG is publicly open-sourced at https://github.com/zjunlp/PromptKG/tree/main/lambdaKG, with a demo video at http://deepke.zjukg.cn/lambdakg.mp4 and long-term maintenance.


Introduction
Knowledge Graphs (KGs) encode real-world facts as structured data and have drawn significant attention from academia, and industry (Zhang et al., 2022b).Knowledge Graph Embedding (KGE) aims to project the relations and entities into a continuous vector space, which can enhance knowledge reasoning abilities and feasibly be applied to downstream tasks: question answering (Saxena et al., 2022), recommendation (Zhang et al., 2021) and so on (Chen et al., 2022b).Previous embedding-based KGE methods, such as TransE (Bordes et al., 2013), involved embedding relational knowledge into a vector space and subsequently optimizing the target object by applying a pre-defined scoring function to those vectors.A few remarkable embedding-based KGE toolkits have been developed, such as OpenKE (Han et al., 2018), LibKGE (Broscheit et al., 2020), PyKEEN (Ali et al., 2021), CogKGE (Jin et al., 2022) and NeuralKG (Zhang et al., 2022c).Nevertheless, these embedding-based KGE approaches are restricted in expressiveness regarding the shallow network architectures without using any side information (e.g., textual description).
By comparison with embedding-based KGE approaches, text-based methods incorporate available texts for KGE.With the development of Pre-trained Language Models (PLMs), many text-based models (Xie et al., 2022;Saxena et al., 2022;Kim et al., 2020;Markowitz et al., 2022;Chen et al., 2022a;Liu et al., 2022) have been proposed, which can obtain promising performance and take advantage of allocating a fixed memory footprint for largescale real-world KGs.Recently, large language models (LLMs) (e.g., GPT-3 (Brown et al., 2020), ChatGPT (OpenAI, 2022)) further demonstrated the ability to perform a variety of natural language processing (NLP) tasks without adaptation, providing potential opportunities of better knowledge representations.However, there is no comprehensive open-sourced library particularly designed for KGE with PLMs at present, which makes it challenging to test new methods and make rigorous comparisons with previous approaches.

System Architecture
The overall features & architecture of LambdaKG are presented in Figure 1.We will detail two major types of PLM-based KGE methods (discriminationbased and generation-based) with various PLMs.
Our design principles are: 1) Core Module with Unified KG Encoder: LambdaKG utilizes a unified encoder to pack graph structure and text semantics, with convenient Trainer&Evaluator, Metric, and Bag of Tricks; 2) Model Hub: LambdaKG is integrated with many cutting-edge PLM-based KGE models; 3) Flexible Downstream Tasks: LambdaKG disentangles KG representation learning and downstream tasks.

Trainer&Evaluator
Typically, the training process with LambdaKG can be decomposed into several distinct steps, such as the forward and backward passes (i.e., training_step), logging of intermediate results (log), and model evaluation (evaluate_step).
Our Trainer class provides a flexible and modular framework for training different types of models, with customizable functions to handle various tasks, such as computing the loss function and updating model parameters.Moreover, the Trainer class allows users to define their own plugins, which can be integrated seamlessly into the training pipeline to provide additional functionalities.

Metric
We design the Metric class to evaluate different models for various tasks.Specifically, we use hits@k with k values of 1, 3, 10 and mean rank (MR) as the evaluation metrics.Hits@k measures the proportion of correct predictions among the top k-ranked results, while MR calculates the average rank of the correct answer.We also implement BLEU-1 score to evaluate the commonsense KG completion tasks following Hwang et al. (2021).

Bag of Tricks
All models in the LambdaKG are based on PLMs, and we equip a bag of tricks of training techniques to improve their performance.In particular, we employ different pluggable modules such as label smoothing and exponential moving average to assist in the training of models.We implement early stopping and fast run modules to prevent overfitting with small data by introducing early stopping and automatic verification mechanisms.Furthermore, we integrate an off-the-shelf Top-k negative sampling strategy to enhance the training by selecting the most informative negative samples during the training process.

Unified KG Encoder
Since LambdaKG is based on PLMs, the most critical thing is to convert structural triples into plain natural language for PLMs to understand.We introduce a unified KG encoder to represent graph structure and text semantics, supporting different types of PLM-based KGE methods.To encode the graph structure, we sample 1-hop neighbor entities and concatenate their tokens as input for implicit structure information.With such a unified KG encoder, LambdaKG can encode both heterogeneous graph structure and text-rich semantic information.For the discrimination-based method, the input is built on the plain text description: (1) where X h , X r , and X t refer to the text sequence of the head entity, relation, and tail entity, respectively.Referring to some prompt learning methods like kNN-KGE (Zhang et al., 2022a), we represent entities and relations in KG with special tokens (See §2.3) and obtain the input as: (2) where [Entity h] represents the special token to the head entity.
For the generation-based model, we leverage the tokens in X h and X r to optimize the model with the label X t .When predicting the head entity, we add a special token [reverse] in the input sequence for reverse reasoning.

Model Hub
As shown in Figure 2 and Table 1, LambdaKG consists of a Model Hub which supports many representative PLM-based KGE methods, mainly follow the two major paradigms of discriminationbased methods and generation-based methods as: Discrimination-based methods There are three kinds of models based on the discrimination method: the first one (e.g., KG-BERT (Yao et al., 2019), PKGC (Lv et al., 2022)) utilizes a single encoder to encode triples of KG with text description; another kind of model (e.g., StAR (Wang et al., 2021), SimKGC (Wang et al., 2022)) leverages siamese encoder (two-tower models) with PLMs to encode entities and relations respectively.For the first kind, the score of each triple is expressed as: (3) where TransformerEnc is the BERT model followed by a binary classifier.However, these models have to iterate all the entities calculating scores to decide the correct one, which is computationintensive, as shown in Table 1.In contrast, twotower models like StAR (Wang et al., 2021) and SimKGC (Wang et al., 2022)  score function to predict the correct tail entity from the candidates, denoted by: Score(⟨h, r⟩, t) = cos(e ⟨h,r⟩ , e t ). (4) The final kind of model, e.g., kNN-KGE (Zhang et al., 2022a), utilizes masked language modeling for KGE, which shares the same architecture as normal discrimination PLMs.Note that there are two modules in the normal PLMs: a word embedding layer to embed the token ids into semantic space and an encoder to generate context-aware token embedding.Here, we take the masked language model and treat entities and relations as special tokens in the "word embedding layer".As shown in Figure 2, the model predicts the correct tail entity with the sequence of the head entity and relation token and their descriptions.For the entity/relation embedding, we freeze the encoder layer, only tuning the entity embedding layer, to optimize the loss function: I(e j = e i ) log p [MASK] = e j | X i ; Θ , (5) where Θ represents the parameters of the model, X i and e i is the description and the embedding of entity i.
Generation-based methods Generation-based models formulate KG completion or other KGintensive tasks as sequence-to-sequence generation.Given a triple with the tail entity missing (h, r, ?), models are fed with ⟨X h , X r ⟩ and then output X t .In the training procedure, generative models maximize the conditional probability: (6) To guarantee the consistency of decoding sequential schemas and tokens in KG, GenKGC (Xie et al., 2022) proposes an entity-aware hierarchical decoder to constrain X t .Besides, KGT5 (Saxena et al., 2022) proposes to pre-train generation-based PLMs with text descriptions for KG representation.
LLMs We further apply the LLMs, namely GPT-3 and ChatGPT, to assess their effectiveness in KGE (KGC with link prediction).Generative LLMs allow the KGC task to be framed as input sentences containing header entities and relations, making it easier for the model to generate sentences with tail entities.A well-designed prompt can improve the performance of LLMs, and prior studies indicate incorporating in-context learning can improve accuracy and ensure consistent output.Thus, we adopt a similar approach that the prompt comprises three components: task description with candidates, demonstrations, and test information.As shown in Figure 3, we employ information retrieval (BM25) to select the top 100 most relevant entities from the training set as candidates.Likewise, the prompt's demonstrations utilize the top-5 most similar instances, which assist the model in comprehending the task more effectively.Furthermore, taking inspiration from the Chain-of-Thought (CoT) method in reasoning tasks, we utilize natural language rationales to improve the model's capacity to reason and explain predictions, ultimately improving its overall performance in KGC tasks.Comparatively, the prompt used for ChatGPT solely utilizes a few demonstrations and test data with these strategies.

Pluggable KGE for Downstream Tasks
We introduce the technical details of applying KGE to downstream tasks as shown in Figure 2.For knowledge graph completion, we feed the model with the textual information ⟨X h , X r ⟩ of the head entity and the relation, then obtain the target tail entity via mask token prediction.For question answering, we feed the model with the question written in natural language concatenated with a [MASK] token to obtain the special token of the target answer (entity).For recommendation, we take the user's interaction history as sequential input (Sun et al., 2019) with entity embeddings and then leverage the mask token prediction to obtain recommended items.For the knowledge probing task, we adopt entity embedding as additional knowledge following PELT (Ye et al., 2022).

System Usage
The proposed system can be used in three scenarios.First, users can utilize LambdaKG to obtain PLM-based KGE for knowledge discovery.LitModel serves as the training of link prediction task class and fit for all models in Model Hub.Users can choose proper models in ModelModule and specific metrics in DataModule to train models to obtain the embedding in the KGs.Moreover, users can utilize LambdaKG PLM-based KGE for downstream tasks.We provide various prompts to obtain the knowledge (entity) embedding in KGs for downstream tasks.For different tasks, we design different base classes for users to efficiently implement their own tasks.Finally, we provide an online interactive demo for PLM-based KGE at https://zjunlp.github.io/project/promptkg/demo.html.

Knowledge Graph Completion
For the KG completion task with small PLMs, we conduct link prediction experiments on two datasets WN18RR (Dettmers et al., 2018), and FB15k-237 (Toutanova et al., 2015).From  we observe that the discrimination-based method SimKGC (Wang et al., 2022) (previous state-of-theart) achieves higher performance than other baselines.Generation-based models like KGT5 (Saxena et al., 2022) and GenKGC (Xie et al., 2022) also yield comparable results and show potential abilities in KG representation.
Small vs. Large LMs We adopt GPT-3/3.5 (text-davinci-001/003 and ChatGPT) for evaluation and assessment through the interfaces provided by OpenAI.The evaluation of ChatGPT is conducted on 224 instances, with each relation in the test set.As shown in Figure 4(a), ChatGPT demonstrates better performance, while text-davinci-003 exhibits a slight gap.The experiment has reaffirmed the capability of LLMs in capturing semantic similarities and regularities among entities, thereby allowing for precise predictions of missing links in knowledge graphs.
In cases where one head entity and relation pair correspond to one or multiple tail entities (1-1 and 1-n cases), we conducted a detailed analysis.Notably, the model performs significantly better in the 1-1 case compared to the 1-n case, as illustrated in Figure 5. Two potential reasons explain this disparity: (1) In the 1-1 case, the model demonstrates a lower propensity for language understanding deviations.Additionally, ChatGPT's training utilizes a larger corpus, enhancing the model to generate accurate responses through analysis and reasoning.(2) The presence of multiple correspondences poses a challenge for the model's capacity to generate informative and contextually relevant responses.Moreover, current evaluation metrics fail to fully capture the intricacy of the responses necessary to properly handle such questions.
We further conduct experiments on commonsense KG completion with ATOMIC2020 (Hwang et al., 2021).As suggested in the paper, we sample 5,000 test queries to evaluate the models (excluding ChatGPT).COMET (BART) is fine-tuned through supervised learning and utilizes greedy decoding to generate answers.For GPT3 and ChatGPT, we provide each relation with 5 examples of heads and tails to construct prompts and evaluate them in a zero-shot setting.The results, as shown in Figure 4(b), demonstrates the BLEU-1 scores on the sampled 5,000 queries, while we sample 115 (5 for each relation) queries from the test for ChatGPT.The results indicate that GPT-3 exhibits limited performance in the system evaluation.After analyzing several cases, we sample 115 (5 for each relation) queries as a benchmark and apply manual scoring to evaluate models.Figure 4(c) depicts the accuracy scores of each model.Our study reveals that ChatGPT is capable of generating reasonable outputs, but they are quite different from the ground truth, which accounts for the final results.

Question Answering
KG is known to be helpful for the task of question answering.We apply LambdaKG to question answering and conduct experiments on the MetaQA dataset.Due to computational resource limits, we only evaluate the 1-hop inference performance.From Table 2, KGT5 in LambdaKG yields the best performance.

Recommendation
For the recommendation task, we conduct experiments on a well-established version ML-20m3 .Linkage of ML-20m and Freebase offered by KB4Rec (Zhao et al., 2019) is utilized to obtain textual descriptions of movies in ML-20m.With movie embeddings pre-trained on these descriptions, we conduct experiments on sequential recommendation tasks following the settings of BERT4Rec (Sun et al., 2019).We notice that LambdaKG is confirmed to be effective for the recommendation compared with BERT4Rec.

Knowledge Probing
Knowledge probing (Petroni et al., 2019) examines the ability of LMs (BERT, RoBERTa, etc.) to recall facts from their parameters.We conduct experiments on LAMA using pre-trained BERT (bert-base-uncased) and RoBERTa (roberta-base) models.To prove that entity embedding enhanced by KGs helps LMs grab more factual knowledge from PLMs, we train a pluggable entity embedding module following PELT (Ye et al., 2022).As shown in Table 2, the performance boosts while we use the entity embedding module.

Conclusion and Future Work
We propose LambdaKG, a library that establishes a unified toolkit with well-defined modules and easy-to-use interfaces to support research on using PLMs on KGs.In the future, we will continue to integrate more models and tasks (e.g., dialogue) into the proposed library to facilitate the research progress of the KG.

Figure 1 :
Figure 1: The architecture and features of LambdaKG.

Figure 2 :
Figure2: PLM-based KGEs in LambdaKG and those KGEs can be applied to KGC, QA, recommendation and knowledge probing.Entity_t refers to the target tail entity, answer entity, recommended items, and target tail entity for different tasks, which follows the pre-train (obtain the embedding) and fine-tune paradigm (task-specific tuning).

Figure 3 :
Figure 3: LLM-based KGC.The prompt comprises three components, namely the task description with candidates, demonstrations, and test information.

Table 1 :
usually encode ⟨h, r⟩ and t to obtain the embeddings.Then, they use a Comparison of different methods based on small PLMs.|L| is the length of the triple description.|L/2| can be seen as the length of entity tokens.|E| and |R| are the numbers of all unique entities and relations in the graph respectively.

Table 2 :
Hits1 and MRR (%) results on KGC, question answering, recommendation and knowledge probing tasks.3 refers to the results from origin papers.