Preserving Privacy Through Dememorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models

Aly Kassem, Omar Mahmoud, Sherif Saad


Abstract
Large Language models (LLMs) are trained on vast amounts of data, including sensitive information that poses a risk to personal privacy if exposed. LLMs have shown the ability to memorize and reproduce portions of their training data when prompted by adversaries. Prior research has focused on addressing this memorization issue and preventing verbatim replication through techniques like knowledge unlearning and data pre-processing. However, these methods have limitations regarding the number of protected samples, limited privacy types, and potentially lower-quality generative models. To tackle this challenge more effectively, we propose “DeMem,” a novel unlearning approach that utilizes an efficient reinforcement learning feedback loop via proximal policy optimization. By fine-tuning the language model with a negative similarity score as a reward signal, we incentivize the LLMs to learn a paraphrasing policy to unlearn the pre-training data. Our experiments demonstrate that DeMem surpasses strong baselines and state-of-the-art methods in terms of its ability to generalize and strike a balance between maintaining privacy and LLM performance.
Anthology ID:
2023.emnlp-main.265
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4360–4379
Language:
URL:
https://aclanthology.org/2023.emnlp-main.265
DOI:
10.18653/v1/2023.emnlp-main.265
Bibkey:
Cite (ACL):
Aly Kassem, Omar Mahmoud, and Sherif Saad. 2023. Preserving Privacy Through Dememorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4360–4379, Singapore. Association for Computational Linguistics.
Cite (Informal):
Preserving Privacy Through Dememorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models (Kassem et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.265.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.265.mp4