Aydar Bulatov
2024
Prompt Me One More Time: A Two-Step Knowledge Extraction Pipeline with Ontology-Based Verification
Alla Chepurova
|
Yuri Kuratov
|
Aydar Bulatov
|
Mikhail Burtsev
Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing
This study explores a method for extending real-world knowledge graphs (specifically, Wikidata) by extracting triplets from texts with the aid of Large Language Models (LLMs). We propose a two-step pipeline that includes the initial extraction of entity candidates, followed by their refinement and linkage to the canonical entities and relations of the knowledge graph. Finally, we utilize Wikidata relation constraints to select only verified triplets. We compare our approach to a model that was fine-tuned on a machine-generated dataset and demonstrate that it performs better on natural data. Our results suggest that LLM-based triplet extraction from texts, with subsequent verification, is a viable method for real-world applications.
2023
Better Together: Enhancing Generative Knowledge Graph Completion with Language Models and Neighborhood Information
Alla Chepurova
|
Aydar Bulatov
|
Yuri Kuratov
|
Mikhail Burtsev
Findings of the Association for Computational Linguistics: EMNLP 2023
Real-world Knowledge Graphs (KGs) often suffer from incompleteness, which limits their potential performance. Knowledge Graph Completion (KGC) techniques aim to address this issue. However, traditional KGC methods are computationally intensive and impractical for large-scale KGs, necessitating the learning of dense node embeddings and computing pairwise distances. Generative transformer-based language models (e.g., T5 and recent KGT5) offer a promising solution as they can predict the tail nodes directly. In this study, we propose to include node neighborhoods as additional information to improve KGC methods based on language models. We examine the effects of this imputation and show that, on both inductive and transductive Wikidata subsets, our method outperforms KGT5 and conventional KGC approaches. We also provide an extensive analysis of the impact of neighborhood on model prediction and show its importance. Furthermore, we point the way to significantly improve KGC through more effective neighborhood selection.