FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

Nakyeong Yang; Minsung Kim; Seunghyun Yoon; Joongbo Shin; Kyomin Jung

FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung

Abstract

Various studies have attempted to remove sensitive or private knowledge from a language model to prevent its unauthorized exposure. However, prior studies have overlooked the inherent complexity and interconnectedness of knowledge, which requires careful examination. To resolve this problem, we first define a new concept called superficial unlearning, which refers to the phenomenon where an unlearning method either fails to erase the interconnected knowledge it should remove or unintentionally erases irrelevant knowledge. Based on the definition, we introduce a novel benchmark, FaithUn, to analyze and evaluate the faithfulness of unlearning in real-world knowledge QA settings. Furthermore, we propose a novel unlearning method, KLUE, which updates only knowledge-related neurons to achieve faithful unlearning. KLUE leverages a regularized explainability method to localize contextual knowledge neurons, updating only these neurons using carefully selected unforgotten samples. Experimental results demonstrate that existing unlearning methods fail to ensure faithful unlearning, while our method shows significant effectiveness in real-world QA unlearning.

Anthology ID:: 2025.emnlp-main.657
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13010–13025
Language:
URL:: https://aclanthology.org/2025.emnlp-main.657/
DOI:
Bibkey:
Cite (ACL):: Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, and Kyomin Jung. 2025. FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 13010–13025, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge (Yang et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.657.pdf
Checklist:: 2025.emnlp-main.657.checklist.pdf

PDF Cite Search Checklist Fix data