Machine Unlearning of Personally Identifiable Information in Large Language Models

Dan Parii; Thomas van Osch; Chang Sun

doi:10.18653/v1/2025.nllp-1.6

Machine Unlearning of Personally Identifiable Information in Large Language Models

Abstract

Pretrained LLMs are trained on massive web-scale datasets, which often contain personally identifiable information (PII), raising serious legal and ethical concerns. A key research challenge is how to effectively unlearn PII without degrading the model’s utility or leaving implicit knowledge that can be exploited.This study proposes UnlearnPII, a benchmark designed to evaluate the effectiveness of PII unlearning methods, addressing limitations in existing metrics that overlook implicit knowledge and assess all tokens equally. Our benchmark focuses on detecting PII leakage, testing model robustness through obfuscated prompts and jailbreak attacks over different domains, while measuring utility and retention quality.To advance practical solutions, we propose a new PII unlearning method - PERMU_tok. By applying token-level noise, we achieve 1) simplified integration into existing workflows, 2) improved retention and output quality, while maintaining unlearning effectiveness. The code is open-source and publicly available.

Anthology ID:: 2025.nllp-1.6
Volume:: Proceedings of the Natural Legal Language Processing Workshop 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venues:: NLLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 54–67
Language:
URL:: https://aclanthology.org/2025.nllp-1.6/
DOI:: 10.18653/v1/2025.nllp-1.6
Bibkey:
Cite (ACL):: Dan Parii, Thomas van Osch, and Chang Sun. 2025. Machine Unlearning of Personally Identifiable Information in Large Language Models. In Proceedings of the Natural Legal Language Processing Workshop 2025, pages 54–67, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Machine Unlearning of Personally Identifiable Information in Large Language Models (Parii et al., NLLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.nllp-1.6.pdf

PDF Cite Search Fix data