Mikhail Burtsev

2026

Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models
Alla Chepurova | Aydar Bulatov | Mikhail Burtsev | Yuri Kuratov
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, leaving their intrinsic quality underexplored.In this work, we propose Wikontic, a multi-stage pipeline that constructs KGs from open-domain texts by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication.The resulting KGs are compact, ontology-consistent, and well-connected; on MuSiQue, the correct answer entity appears in 96% of generated triplets.On HotpotQA, our triplets-only setup achieves 76.0 F1, and on MuSiQue 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that still require textual context. In addition, Wikontic attains state-of-the-art information-retention performance on the MINE-1 benchmark (86%), outperforming prior KG construction methods.Wikontic is also efficient at build time: KG construction uses less than 1,000 output tokens, about 3× fewer than AriGraph and <1/20 of GraphRAG.The proposed pipeline improves the quality of the generated KG and offers a scalable solution for leveraging structured knowledge in LLMs. Wikontic is available at https://github.com/screemix/Wikontic.

2025

pdf bib abs

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Yuri Kuratov | Mikhail Arkhipov | Aydar Bulatov | Mikhail Burtsev
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A range of recent works addresses the problem of compression of sequence of tokens into a shorter sequence of real-valued vectors to be used as inputs instead of token embeddings or key-value cache. These approaches are focused on reduction of the amount of compute in existing language models rather than minimization of number of bits needed to store text. Despite relying on powerful models as encoders, the maximum attainable lossless compression ratio is typically not higher than x10. This fact is highly intriguing because, in theory, the maximum information capacity of large real-valued vectors is far beyond the presented rates even for 16-bit precision and a modest vector size. In this work, we explore the limits of compression by replacing the encoder with a per-sample optimization procedure. We show that vectors with compression ratios up to x1500 exist, which highlights two orders of magnitude gap between existing and practically attainable solutions. Furthermore, we empirically show that the compression limits are determined not by the length of the input but by the amount of uncertainty to be reduced, namely, the cross-entropy loss on this sequence without any conditioning. The obtained limits highlight the substantial gap between the theoretical capacity of input embeddings and their practical utilization, suggesting significant room for optimization in model design.

2024

pdf bib abs

Prompt Me One More Time: A Two-Step Knowledge Extraction Pipeline with Ontology-Based Verification
Alla Chepurova | Yuri Kuratov | Aydar Bulatov | Mikhail Burtsev
Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing

This study explores a method for extending real-world knowledge graphs (specifically, Wikidata) by extracting triplets from texts with the aid of Large Language Models (LLMs). We propose a two-step pipeline that includes the initial extraction of entity candidates, followed by their refinement and linkage to the canonical entities and relations of the knowledge graph. Finally, we utilize Wikidata relation constraints to select only verified triplets. We compare our approach to a model that was fine-tuned on a machine-generated dataset and demonstrate that it performs better on natural data. Our results suggest that LLM-based triplet extraction from texts, with subsequent verification, is a viable method for real-world applications.

2023

pdf bib abs

Uncertainty Guided Global Memory Improves Multi-Hop Question Answering
Alsu Sagirova | Mikhail Burtsev
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Transformers have become the gold standard for many natural language processing tasks and, in particular, for multi-hop question answering (MHQA). This task includes processing a long document and reasoning over the multiple parts of it. The landscape of MHQA approaches can be classified into two primary categories. The first group focuses on extracting supporting evidence, thereby constraining the QA model’s context to predicted facts. Conversely, the second group relies on the attention mechanism of the long input encoding model to facilitate multi-hop reasoning. However, attention-based token representations lack explicit global contextual information to connect reasoning steps. To address these issues, we propose GEMFormer, a two-stage method that first collects relevant information over the entire document to the memory and then combines it with local context to solve the task. Our experimental results show that fine-tuning a pre-trained model with memory-augmented input, including the most certain global elements, improves the model’s performance on three MHQA datasets compared to the baseline. We also found that the global explicit memory contains information from supporting facts required for the correct answer.

pdf bib abs

Better Together: Enhancing Generative Knowledge Graph Completion with Language Models and Neighborhood Information
Alla Chepurova | Aydar Bulatov | Yuri Kuratov | Mikhail Burtsev
Findings of the Association for Computational Linguistics: EMNLP 2023

Real-world Knowledge Graphs (KGs) often suffer from incompleteness, which limits their potential performance. Knowledge Graph Completion (KGC) techniques aim to address this issue. However, traditional KGC methods are computationally intensive and impractical for large-scale KGs, necessitating the learning of dense node embeddings and computing pairwise distances. Generative transformer-based language models (e.g., T5 and recent KGT5) offer a promising solution as they can predict the tail nodes directly. In this study, we propose to include node neighborhoods as additional information to improve KGC methods based on language models. We examine the effects of this imputation and show that, on both inductive and transductive Wikidata subsets, our method outperforms KGT5 and conventional KGC approaches. We also provide an extensive analysis of the impact of neighborhood on model prediction and show its importance. Furthermore, we point the way to significantly improve KGC through more effective neighborhood selection.

pdf bib abs

Hybrid Uncertainty Quantification for Selective Text Classification in Ambiguous Tasks
Artem Vazhentsev | Gleb Kuzmin | Akim Tsvigun | Alexander Panchenko | Maxim Panov | Mikhail Burtsev | Artem Shelmanov
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Many text classification tasks are inherently ambiguous, which results in automatic systems having a high risk of making mistakes, in spite of using advanced machine learning models. For example, toxicity detection in user-generated content is a subjective task, and notions of toxicity can be annotated according to a variety of definitions that can be in conflict with one another. Instead of relying solely on automatic solutions, moderation of the most difficult and ambiguous cases can be delegated to human workers. Potential mistakes in automated classification can be identified by using uncertainty estimation (UE) techniques. Although UE is a rapidly growing field within natural language processing, we find that state-of-the-art UE methods estimate only epistemic uncertainty and show poor performance, or under-perform trivial methods for ambiguous tasks such as toxicity detection. We argue that in order to create robust uncertainty estimation methods for ambiguous tasks it is necessary to account also for aleatoric uncertainty. In this paper, we propose a new uncertainty estimation method that combines epistemic and aleatoric UE methods. We show that by using our hybrid method, we can outperform state-of-the-art UE methods for toxicity detection and other ambiguous text classification tasks.

pdf bib abs

An open-source DeepPavlov Dream Platform is specifically tailored for development of complex dialog systems like Generative AI Assistants. The stack prioritizes efficiency, modularity, scalability, and extensibility with the goal to make it easier to develop complex dialog systems from scratch. It supports modular approach to implementation of conversational agents enabling their development through the choice of NLP components and conversational skills from a rich library organized into the distributions of ready-for-use multi-skill AI assistant systems. In DeepPavlov Dream, multi-skill Generative AI Assistant consists of NLP components that extract features from user utterances, conversational skills that generate or retrieve a response, skill and response selectors that facilitate choice of relevant skills and the best response, as well as a conversational orchestrator that enables creation of multi-skill Generative AI Assistants scalable up to industrial grade AI assistants. The platform allows to integrate large language models into dialog pipeline, customize with prompt engineering, handle multiple prompts during the same dialog session and create simple multimodal assistants.

2022

pdf bib abs

Construction of human-curated annotated datasets for abstractive text summarization (ATS) is very time-consuming and expensive because creating each instance requires a human annotator to read a long document and compose a shorter summary that would preserve the key information relayed by the original document. Active Learning (AL) is a technique developed to reduce the amount of annotation required to achieve a certain level of machine learning model performance. In information extraction and text classification, AL can reduce the amount of labor up to multiple times. Despite its potential for aiding expensive annotation, as far as we know, there were no effective AL query strategies for ATS. This stems from the fact that many AL strategies rely on uncertainty estimation, while as we show in our work, uncertain instances are usually noisy, and selecting them can degrade the model performance compared to passive annotation. We address this problem by proposing the first effective query strategy for AL in ATS based on diversity principles. We show that given a certain annotation budget, using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores. Additionally, we analyze the effect of self-learning and show that it can additionally increase the performance of the model.

pdf bib abs

Uncertainty estimation (UE) of model predictions is a crucial step for a variety of tasks such as active learning, misclassification detection, adversarial attack detection, out-of-distribution detection, etc. Most of the works on modeling the uncertainty of deep neural networks evaluate these methods on image classification tasks. Little attention has been paid to UE in natural language processing. To fill this gap, we perform a vast empirical investigation of state-of-the-art UE methods for Transformer models on misclassification detection in named entity recognition and text classification tasks and propose two computationally efficient modifications, one of which approaches or even outperforms computationally intensive methods.

pdf bib abs

Attention Understands Semantic Relations
Anastasia Chizhikova | Sanzhar Murzakhmetov | Oleg Serikov | Tatiana Shavrina | Mikhail Burtsev
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Today, natural language processing heavily relies on pre-trained large language models. Even though such models are criticized for the poor interpretability, they still yield state-of-the-art solutions for a wide set of very different tasks. While lots of probing studies have been conducted to measure the models’ awareness of grammatical knowledge, semantic probing is less popular. In this work, we introduce the probing pipeline to study the representedness of semantic relations in transformer language models. We show that in this task, attention scores are nearly as expressive as the layers’ output activations, despite their lesser ability to represent surface cues. This supports the hypothesis that attention mechanisms are focusing not only on the syntactic relational information but also on the semantic one.

2021

pdf bib abs

Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions
Mohammad Aliannejadi | Julia Kiseleva | Aleksandr Chuklin | Jeff Dalton | Mikhail Burtsev
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Enabling open-domain dialogue systems to ask clarifying questions when appropriate is an important direction for improving the quality of the system response. Namely, for cases when a user request is not specific enough for a conversation system to provide an answer right away, it is desirable to ask a clarifying question to increase the chances of retrieving a satisfying answer. To address the problem of ‘asking clarifying questions in open-domain dialogues’: (1) we collect and release a new dataset focused on open-domain single- and multi-turn conversations, (2) we benchmark several state-of-the-art neural baselines, and (3) we propose a pipeline consisting of offline and online steps for evaluating the quality of clarifying questions in various dialogues. These contributions are suitable as a foundation for further research.

pdf bib abs

Discourse-Driven Integrated Dialogue Development Environment for Open-Domain Dialogue Systems
Denis Kuznetsov | Dmitry Evseev | Lidia Ostyakova | Oleg Serikov | Daniel Kornev | Mikhail Burtsev
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

Development environments for spoken dialogue systems are popular today because they enable rapid creation of the dialogue systems in times when usage of the voice AI Assistants is constantly growing. We describe a graphical Discourse-Driven Integrated Dialogue Development Environment (DD-IDDE) for spoken open-domain dialogue systems. The DD-IDDE allows dialogue architects to interactively define dialogue flows of their skills/chatbots with the aid of the discourse-driven recommendation system, enhance these flows in the Python-based DSL, deploy, and then further improve based on the skills/chatbots usage statistics. We show how these skills/chatbots can be specified through a graphical user interface within the VS Code Extension, and then run on top of the Dialog Flow Framework (DFF). An earlier version of this framework has been adopted in one of the Alexa Prize 4 socialbots while the updated version was specifically designed to power the described DD-IDDE solution.

2020

pdf bib

Proceedings of the 5th International Workshop on Search-Oriented Conversational AI (SCAI)
Jeff Dalton | Aleksandr Chuklin | Julia Kiseleva | Mikhail Burtsev
Proceedings of the 5th International Workshop on Search-Oriented Conversational AI (SCAI)

2018

pdf bib

Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI
Aleksandr Chuklin | Jeff Dalton | Julia Kiseleva | Alexey Borisov | Mikhail Burtsev
Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI

Adoption of messaging communication and voice assistants has grown rapidly in the last years. This creates a demand for tools that speed up prototyping of feature-rich dialogue systems. An open-source library DeepPavlov is tailored for development of conversational agents. The library prioritises efficiency, modularity, and extensibility with the goal to make it easier to develop dialogue systems from scratch and with limited data available. It supports modular as well as end-to-end approaches to implementation of conversational agents. Conversational agent consists of skills and every skill can be decomposed into components. Components are usually models which solve typical NLP tasks such as intent classification, named entity recognition or pre-trained word vectors. Sequence-to-sequence chit-chat skill, question answering skill or task-oriented skill can be assembled from components provided in the library.