Anna Borisiuk

2026

Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning
Anna Borisiuk | Andrey Savchenko | Alexander Panchenko | Elena Tutubalina
Findings of the Association for Computational Linguistics: ACL 2026

Machine Unlearning (MU) enables Large Language Models (LLMs) to remove unsafe or outdated information. However, existing work assumes that all facts are equally forgettable and largely ignores whether the forgotten knowledge originates from pretraining or supervised fine-tuning (SFT). In this paper, we introduce DUAL (Dual Unlearning Evaluation across Training Stages), a benchmark of 28.6k Wikidata-derived triplets annotated with fact popularity using Wikipedia link counts and LLM-based salience scores. Our experiments show that pretrained and SFT models respond differently to unlearning. An SFT step on the forget data yields smoother forgetting, more stable tuning, and 10–50% higher retention, while direct unlearning on pretrained models remains unstable and prone to relearning or catastrophic forgetting.

pdf bib abs

The Silence of the Facts: Popularity as a Barrier to Machine Unlearning
Anna Borisiuk | Andrey Savchenko | Alexander Panchenko | Elena Tutubalina
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Machine Unlearning is a valuable ability of LLMs, enabling the removal of unsafe, outdated, or private information. Existing unlearning methods, however, are often evaluated under the assumption that all facts are equally challenging to forget. Controllable knowledge removal is essential for reliable NLP systems. In this paper, we investigate whether fact popularity influences the efficiency of LLM unlearning. To answer this question, we build **UNLamb** benchmark designed to systematically investigate this relationship. It consists of 11.6k question-answer pairs derived from real-world knowledge in Wikidata, explicitly partitioned into rare and popular facts. Using this benchmark, we perform a comprehensive evaluation of state-of-the-art unlearning algorithms on a set of models of different sizes. We conduct a comprehensive analysis of four unlearning methods across three validation sets and two LLMs. We show that larger models struggle more to forget popular entities, often damaging related knowledge in the process. In contrast, it is much easier to remove rare facts without side effects.

Co-authors

Venues

ACL1
Findings1

Fix author