CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

Tianyu Yang; Lisen Dai; Xiangqi Wang; Minhao Cheng; Yapeng Tian; Xiangliang Zhang

doi:10.18653/v1/2025.acl-long.1469

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

Tianyu Yang, Lisen Dai, Xiangqi Wang, Minhao Cheng, Yapeng Tian, Xiangliang Zhang

Abstract

Machine unlearning (MU) has gained significant attention as a means to remove the influence of specific data from a trained model without requiring full retraining. While progress has been made in unimodal domains like text and image classification, unlearning in multimodal models remains relatively under-explored. In this work, we address the unique challenges of unlearning in CLIP, a prominent multimodal model that aligns visual and textual representations. We introduce CLIPErase, a novel approach that disentangles and selectively forgets both visual and textual associations, ensuring that unlearning does not compromise model performance.CLIPErase consists of three key modules: a Forgetting Module that disrupts the associations in the forget set, a Retention Module that preserves performance on the retain set, and a Consistency Module that maintains consistency with the original model. Extensive experiments on CIFAR-100, Flickr30K, and Conceptual 12M across five CLIP downstream tasks, as well as an evaluation on diffusion models, demonstrate that CLIPErase effectively removes designated associations from multimodal samples in downstream tasks, while preserving the model’s performance on the retain set after unlearning.

Anthology ID:: 2025.acl-long.1469
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 30438–30452
Language:
URL:: https://aclanthology.org/2025.acl-long.1469/
DOI:: 10.18653/v1/2025.acl-long.1469
Bibkey:
Cite (ACL):: Tianyu Yang, Lisen Dai, Xiangqi Wang, Minhao Cheng, Yapeng Tian, and Xiangliang Zhang. 2025. CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30438–30452, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP (Yang et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1469.pdf

PDF Cite Search Fix data