PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization

Yidan Wang; Yanan Cao; Yubing Ren; Fang Fang; Zheng Lin; Binxing Fang

doi:10.18653/v1/2025.acl-long.475

PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization

Yidan Wang, Yanan Cao, Yubing Ren, Fang Fang, Zheng Lin, Binxing Fang

Abstract

Large Language Models (LLMs) excel in various domains but pose inherent privacy risks. Existing methods to evaluate privacy leakage in LLMs often use memorized prefixes or simple instructions to extract data, both of which well-alignment models can easily block. Meanwhile, Jailbreak attacks bypass LLM safety mechanisms to generate harmful content, but their role in privacy scenarios remains underexplored. In this paper, we examine the effectiveness of jailbreak attacks in extracting sensitive information, bridging privacy leakage and jailbreak attacks in LLMs. Moreover, we propose PIG, a novel framework targeting Personally Identifiable Information (PII) and addressing the limitations of current jailbreak methods. Specifically, PIG identifies PII entities and their types in privacy queries, uses in-context learning to build a privacy context, and iteratively updates it with three gradient-based strategies to elicit target PII. We evaluate PIG and existing jailbreak methods using two privacy-related datasets. Experiments on four white-box and two black-box LLMs show that PIG outperforms baseline methods and achieves state-of-the-art (SoTA) results. The results underscore significant privacy risks in LLMs, emphasizing the need for stronger safeguards.

Anthology ID:: 2025.acl-long.475
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9645–9660
Language:
URL:: https://aclanthology.org/2025.acl-long.475/
DOI:: 10.18653/v1/2025.acl-long.475
Bibkey:
Cite (ACL):: Yidan Wang, Yanan Cao, Yubing Ren, Fang Fang, Zheng Lin, and Binxing Fang. 2025. PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9645–9660, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization (Wang et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.475.pdf

PDF Cite Search Fix data