Irina Piontkovskaya - ACL Anthology

Irina Piontkovskaya

2026

AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders
Georgii Aparin | Tasnima Sadekova | Alexey Rukhovich | Assel Yermekova | Laida Kushnareva | Vadim Popov | Kristian Kuznetsov | Irina Piontkovskaya
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Sparse Autoencoders (SAEs) are powerful tools for interpreting neural representations, yet their use in audio remains underexplored. We train SAEs across all encoder layers of Whisper and HuBERT, provide an extensive evaluation of their stability, interpretability, and show their practical utility. Over 50% of the features remain consistent across random seeds, and reconstruction quality is preserved. SAE features capture general acoustic and semantic information as well as specific events, including environmental noises and paralinguistic sounds (e.g. laughter, whispering) and disentangle them effectively, requiring removal of only 19-27% of features to erase a concept. Feature steering reduces Whisper’s false speech detections by 70% with negligible WER increase, demonstrating real-world applicability. Finally, we find SAE features correlated with human EEG activity during speech perception, indicating alignment with human neural processing. The code and checkpoints are available at https://github.com/audiosae/audiosae_demo.

Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story
Pedashenko Vladislav | Laida Kushnareva | Yana Khassan Nibal | Eduard Tulchinskii | Kristian Kuznetsov | Vladislav Zharchinskii | Yury Maximov | Irina Piontkovskaya
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Intrinsic dimension (ID) is an important tool in modern LLM analysis, informing studies of training dynamics, scaling behavior, and dataset structure, yet its textual determinants remain underexplored. We provide the first comprehensive study grounding ID in interpretable text properties through cross-encoder analysis, linguistic features, and sparse autoencoders (SAEs). In this work, we establish three key findings. First, ID is complementary to entropy-based metrics: after controlling for length, the two are uncorrelated, with ID capturing geometric complexity orthogonal to prediction quality. Second, ID exhibits robust genre stratification: scientific prose shows low ID (∼ 8), encyclopedic content medium ID (∼ 9), and creative/opinion writing high ID (∼ 10.5) across all models tested. This reveals that contemporary LLMs find scientific text "representationally simple" while fiction requires additional degrees of freedom. Third, using SAEs, we identify causal features: scientific signals (formal tone, report templates, statistics) reduce ID; humanized signals (personalization, emotion, narrative) increase it. Steering experiments confirm these effects are causal. Thus, for contemporary models, scientific writing appears comparatively "easy", whereas fiction, opinion, and affect add representational degrees of freedom. Our multi-faceted analysis provides practical guidance for the proper use of ID and the sound interpretation of ID-based results.

2025

Quantifying Logical Consistency in Transformers via Query-Key Alignment
Eduard Tulchinskii | Laida Kushnareva | Anastasia Voznyuk | Andrei Andriiainen | Irina Piontkovskaya | Evgeny Burnaev | Serguei Barannikov
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) excel at many NLP tasks, yet their multi-step logical reasoning remains unreliable. Existing solutions such as Chain-of-Thought prompting generate intermediate steps but provide no internal check of their logical coherence. In this paper, we use the “QK-score”, a lightweight metric based on query–key alignments within transformer attention heads, to evaluate the logical reasoning capabilities of LLMs. Our method automatically identifies attention heads that play a key role in distinguishing valid from invalid logical inferences, enabling efficient inference-time evaluation via a single forward pass. It reveals latent reasoning structure in LLMs and provides a scalable mechanistic alternative to ablation-based analysis. Across three benchmarks: ProntoQA-OOD, PARARULE-Plus, and MultiLogicEval, with models ranging from 1.5B to 70B parameters, the selected heads predict logical validity up to 14% better than the model probabilities, and remain robust under distractors and increasing reasoning depth of d≤ 6.

ToolReflection: Improving Large Language Models for Real-World API Calls with Self-Generated Data
Gregory Polyakov | Ilseyar Alimova | Dmitry Abulkhanov | Ivan Sedykh | Andrey Bout | Sergey Nikolenko | Irina Piontkovskaya
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)

While open-source large language models (LLMs) have advanced in leveraging third-party tools, significant challenges remain in real-world API usage, where behavior is unpredictable or poorly specified. Existing benchmarks often fail to capture this complexity. We propose ToolReflection, a novel method that improves LLMs’ ability to self-correct API calls by utilizing real-time API feedback. We also introduce new datasets specifically designed to test model performance under realistic conditions. In ToolReflection, models undergo instruction tuning on a dataset augmented with self-generated errors and corrections. Our evaluation across ToolAlpaca, ToolBench benchmarks, and three newly developed datasets (GPT4Tools-OOD, GPT4Tools-OOD-Hard, and Multistep-100) demonstrates its effectiveness. ToolReflection boosts overall success rates by 25.4% on GPT4Tools-OOD, 56.2% on GPT4Tools-OOD-Hard, and 4% on Multistep-100, outperforming original models. On ToolAlpaca, we show a 14% improvement in the “Simulated” setting and 10.5% in the “Real-world” scenario. Our error analysis highlights ToolReflection significantly enhances recovery from incorrect tool calls, even with incomplete or erroneous API documentation. We have released the code, prompts, and data at https://github.com/polgrisha/ToolReflection.

CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search
Nikita Sorokin | Tikhonov Anton | Dmitry Abulkhanov | Ivan Sedykh | Irina Piontkovskaya | Valentin Malykh
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

We consider the well-known and important tasks of clone detection and information retrieval for source code. The most standard setup is to search clones inside the same language code snippets. But it is also useful to find code snippets with identical behaviour in different programming languages. Nevertheless multi- and cross-lingual clone detection has been little studied in literature. We present a novel training procedure, cross-consistency training (CCT) leveraging cross-lingual similarity, that we apply to train language models on source code in various programming languages. We show that this training is effective both for encoder- and decoder-based models.The trained encoder-based CCT-LM model%and fine-tuned with CCT,achieves a new state of the art on POJ-104 (monolingual C++ clone detection benchmark) with 96.73% MAP and AdvTest (monolingual Python code search benchmark) with 47.18% MRR. The decoder-based CCT-LM model shows comparable performance in these tasks. In addition, we formulate the multi- and cross-lingual clone detection problem and present XCD, a new benchmark dataset produced from CodeForces submissions.

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders
Kristian Kuznetsov | Laida Kushnareva | Anton Razzhigaev | Polina Druzhinina | Anastasia Voznyuk | Irina Piontkovskaya | Evgeny Burnaev | Serguei Barannikov
Findings of the Association for Computational Linguistics: ACL 2025

Artificial Text Detection (ATD) is becoming increasingly important with the rise of advanced Large Language Models (LLMs). Despite numerous efforts, no single algorithm performs consistently well across different types of unseen text or guarantees effective generalization to new LLMs. Interpretability plays a crucial role in achieving this goal. In this study, we enhance ATD interpretability by using Sparse Autoencoders (SAE) to extract features from Gemma-2-2B’s residual stream. We identify both interpretable and efficient features, analyzing their semantics and relevance through domain- and model-specific statistics, a steering approach, and manual or LLM-based interpretation of obtained features. Our methods offer valuable insights into how texts from various models differ from human-written content. We show that modern LLMs have a distinct writing style, especially in information-dense domains, even though they can produce human-like outputs with personalized prompts. The code for this paper is available at https://github.com/pyashy/SAE_ATD.

Fast Thinking with Structured Prompts: Enabling LLM Reasoning without Chain-of-Thought Generation
Kirill Morozov | Liubov Chubarova | Irina Piontkovskaya
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

The emergence of complex reasoning abilities in large language models (LLMs) has sparked great interest, and a variety of prompting techniques was proposed to coax them into emulating human thought processes. In this work, we introduce Think Node-by-Node, a graph-based reasoning framework inspired by mind maps, flowcharts, and other visual aids that help humans tackle complex problems. Rather than generating images directly, our approach leverages standard graph-building and rendering libraries, and requires no fine-tuning, only the model’s native coding capabilities. We further explore a “Fast Thinking” regime, in which a graph-reasoning example provided in the prompt, but the model generates the answers directly, without the full thought process reconstruction. Surprisingly, this approach leads to significant improvement upon baseline in general-knowledge tasks. Remarkably, Think Node-by-Node maintains strong performance even under a strict 25-token budget for answer generation. Across two instruction-tuned LLMs (0.5B and 7B parameters), our FastTNbN strategy outperforms baseline prompting techniques, improving accuracy by up to 10%, and exceeds the capabilities of other structured prompting methods under equivalent generation constraints.

2024

Robust AI-Generated Text Detection by Restricted Embeddings
Kristian Kuznetsov | Eduard Tulchinskii | Laida Kushnareva | German Magai | Serguei Barannikov | Sergey Nikolenko | Irina Piontkovskaya
Findings of the Association for Computational Linguistics: EMNLP 2024

Growing amount and quality of AI-generated texts makes detecting such content more difficult. In most real-world scenarios, the domain (style and topic) of generated data and the generator model are not known in advance. In this work, we focus on the robustness of classifier-based detectors of AI-generated text, namely their ability to transfer to unseen generators or semantic domains. We investigate the geometry of the embedding space of Transformer-based text encoders and show that clearing out harmful linear subspaces helps to train a robust classifier, ignoring domain-specific spurious features. We investigate several subspace decomposition and feature selection strategies and achieve significant improvements over state of the art methods in cross-domain and cross-generator transfer. Our best approaches for head-wise and coordinate-based subspace removal increase the mean out-of-distribution (OOD) classification score by up to 9% and 14% in particular setups for RoBERTa and BERT embeddings respectively. We release our code and data: https://github.com/SilverSolver/RobustATD

2023

Can BERT eat RuCoLA? Topological Data Analysis to Explain
Irina Proskurina | Ekaterina Artemova | Irina Piontkovskaya
Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)

This paper investigates how Transformer language models (LMs) fine-tuned for acceptability classification capture linguistic features. Our approach is based on best practices of topological data analysis (TDA) in NLP: we construct directed attention graphs from attention matrices, derive topological features from them and feed them to linear classifiers. We introduce two novel features, chordality and the matching number, and show that TDA-based classifiers outperform fine-tuning baselines. We experiment with two datasets, CoLA and RuCoLA, in English and Russian, which are typologically different languages. On top of that, we propose several black-box introspection techniques aimed at detecting changes in the attention mode of the LM’s during fine-tuning, defining the LM’s prediction confidences, and associating individual heads with fine-grained grammar phenomena. Our results contribute to understanding the behaviour of monolingual LMs in the acceptability classification task, provide insights into the functional roles of attention heads, and highlight the advantages of TDA-based approaches for analyzing LMs.We release the code and the experimental results for further uptake.

GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding
Konstantin Yakovlev | Alexander Podolskiy | Andrey Bout | Sergey Nikolenko | Irina Piontkovskaya
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Grammatical error correction (GEC) is an important NLP task that is currently usually solved with autoregressive sequence-to-sequence models. However, approaches of this class are inherently slow due to one-by-one token generation, so non-autoregressive alternatives are needed. In this work, we propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network that outputs a self-attention weight matrix that can be used in beam search to find the best permutation of input tokens (with auxiliary <ins> tokens) and a decoder network based on a step-unrolled denoising autoencoder that fills in specific tokens. This allows us to find the token permutation after only one forward pass of the permutation network, avoiding autoregressive constructions. We show that the resulting network improves over previously known non-autoregressive methods for GEC and reaches the level of autoregressive methods that do not use language-specific synthetic data generation methods. Our results are supported by a comprehensive experimental validation on the ConLL-2014 and BEA datasets and an extensive ablation study that supports our architectural and algorithmic choices.

Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule
Andrey Bout | Alexander Podolskiy | Sergey Nikolenko | Irina Piontkovskaya
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Progress in neural grammatical error correction (GEC) is hindered by the lack of annotated training data. Sufficient amounts of high-quality manually annotated data are not available, so recent research has relied on generating synthetic data, pretraining on it, and then fine-tuning on real datasets; performance gains have been achieved either by ensembling or by using huge pretrained models such as XXL-T5 as the backbone. In this work, we explore an orthogonal direction: how to use available data more efficiently. First, we propose auxiliary tasks that exploit the alignment between the original and corrected sentences, such as predicting a sequence of corrections. We formulate each task as a sequence-to-sequence problem and perform multi-task training. Second, we discover that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance, so we set out to find the best training schedule. Together, these two ideas lead to significant improvements, producing results that improve state of the art with much smaller models; in particular, we outperform the best models based on T5-XXL (11B parameters) with a BART-based model (400M parameters).

2022

Acceptability Judgements via Examining the Topology of Attention Maps
Daniil Cherniavskii | Eduard Tulchinskii | Vladislav Mikhailov | Irina Proskurina | Laida Kushnareva | Ekaterina Artemova | Serguei Barannikov | Irina Piontkovskaya | Dmitri Piontkovski | Evgeny Burnaev
Findings of the Association for Computational Linguistics: EMNLP 2022

The role of the attention mechanism in encoding linguistic knowledge has received special interest in NLP. However, the ability of the attention heads to judge the grammatical acceptability of a sentence has been underexplored. This paper approaches the paradigm of acceptability judgments with topological data analysis (TDA), showing that the geometric properties of the attention graph can be efficiently exploited for two standard practices in linguistics: binary judgments and linguistic minimal pairs. Topological features enhance the BERT-based acceptability classifier scores by 8%-24% on CoLA in three languages (English, Italian, and Swedish). By revealing the topological discrepancy between attention maps of minimal pairs, we achieve the human-level performance on the BLiMP benchmark, outperforming nine statistical and Transformer LM baselines. At the same time, TDA provides the foundation for analyzing the linguistic functions of attention heads and interpreting the correspondence between the graph features and grammatical phenomena. We publicly release the code and other materials used in the experiments.

Template-based Approach to Zero-shot Intent Recognition
Dmitry Lamanov | Pavel Burnyshev | Ekaterina Artemova | Valentin Malykh | Andrey Bout | Irina Piontkovskaya
Proceedings of the 15th International Conference on Natural Language Generation

Ask Me Anything in Your Native Language
Nikita Sorokin | Dmitry Abulkhanov | Irina Piontkovskaya | Valentin Malykh
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Cross-lingual question answering is a thriving field in the modern world, helping people to search information on the web more efficiently. One of the important scenarios is to give an answer even there is no answer in the language a person asks a question with. We present a novel approach based on single encoder for query and passage for retrieval from multi-lingual collection, together with cross-lingual generative reader. It achieves a new state of the art in both retrieval and end-to-end tasks on the XOR TyDi dataset outperforming the previous results up to 10% on several languages. We find that our approach can be generalized to more than 20 languages in zero-shot approach and outperform all previous models by 12%.

2021

Multiple Teacher Distillation for Robust and Greener Models
Artur Ilichev | Nikita Sorokin | Irina Piontkovskaya | Valentin Malykh
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

The language models nowadays are in the center of natural language processing progress. These models are mostly of significant size. There are successful attempts to reduce them, but at least some of these attempts rely on randomness. We propose a novel distillation procedure leveraging on multiple teachers usage which alleviates random seed dependency and makes the models more robust. We show that this procedure applied to TinyBERT and DistilBERT models improves their worst case results up to 2% while keeping almost the same best-case ones. The latter fact keeps true with a constraint on computational time, which is important to lessen the carbon footprint. In addition, we present the results of an application of the proposed procedure to a computer vision model ResNet, which shows that the statement keeps true in this totally different domain.

InFoBERT: Zero-Shot Approach to Natural Language Understanding Using Contextualized Word Embedding
Pavel Burnyshev | Andrey Bout | Valentin Malykh | Irina Piontkovskaya
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Natural language understanding is an important task in modern dialogue systems. It becomes more important with the rapid extension of the dialogue systems’ functionality. In this work, we present an approach to zero-shot transfer learning for the tasks of intent classification and slot-filling based on pre-trained language models. We use deep contextualized models feeding them with utterances and natural language descriptions of user intents to get text embeddings. These embeddings then used by a small neural network to produce predictions for intent and slot probabilities. This architecture achieves new state-of-the-art results in two zero-shot scenarios. One is a single language new skill adaptation and another one is a cross-lingual adaptation.

Artificial Text Detection via Examining the Topology of Attention Maps
Laida Kushnareva | Daniil Cherniavskii | Vladislav Mikhailov | Ekaterina Artemova | Serguei Barannikov | Alexander Bernstein | Irina Piontkovskaya | Dmitri Piontkovski | Evgeny Burnaev
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The impressive capabilities of recent generative models to create texts that are challenging to distinguish from the human-written ones can be misused for generating fake news, product reviews, and even abusive content. Despite the prominent performance of existing methods for artificial text detection, they still lack interpretability and robustness towards unseen models. To this end, we propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) which is currently understudied in the field of NLP. We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets, and tend to be the most robust towards unseen GPT-style generation models as opposed to existing methods. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties. The results demonstrate that TDA is a promising line with respect to NLP tasks, specifically the ones that incorporate surface and structural information.

Single Example Can Improve Zero-Shot Data Generation
Pavel Burnyshev | Valentin Malykh | Andrey Bout | Ekaterina Artemova | Irina Piontkovskaya
Proceedings of the 14th International Conference on Natural Language Generation

Sub-tasks of intent classification, such as robustness to distribution shift, adaptation to specific user groups and personalization, out-of-domain detection, require extensive and flexible datasets for experiments and evaluation. As collecting such datasets is time- and labor-consuming, we propose to use text generation methods to gather datasets. The generator should be trained to generate utterances that belong to the given intent. We explore two approaches to the generation of task-oriented utterances: in the zero-shot approach, the model is trained to generate utterances from seen intents and is further used to generate utterances for intents unseen during training. In the one-shot approach, the model is presented with a single utterance from a test intent. We perform a thorough automatic, and human evaluation of the intrinsic properties of two-generation approaches. The attributes of the generated data are close to original test sets, collected via crowd-sourcing.

2020

SumTitles: a Summarization Dataset with Low Extractiveness
Valentin Malykh | Konstantin Chernis | Ekaterina Artemova | Irina Piontkovskaya
Proceedings of the 28th International Conference on Computational Linguistics

The existing dialogue summarization corpora are significantly extractive. We introduce a methodology for dataset extractiveness evaluation and present a new low-extractive corpus of movie dialogues for abstractive text summarization along with baseline evaluation. The corpus contains 153k dialogues and consists of three parts: 1) automatically aligned subtitles, 2) automatically aligned scenes from scripts, and 3) manually aligned scenes from scripts. We also present an alignment algorithm which we use to construct the corpus.

Venues