Prompt Leakage effect and mitigation strategies for multi-turn LLM Applications

Divyansh Agarwal; Alexander Richard Fabbri; Ben Risher; Philippe Laban; Shafiq Joty; Chien-Sheng Wu

doi:10.18653/v1/2024.emnlp-industry.94

Prompt Leakage effect and mitigation strategies for multi-turn LLM Applications

Divyansh Agarwal, Alexander Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, Chien-Sheng Wu

Abstract

Prompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threats and mitigation strategies is lacking, especially for multi-turn LLM interactions. In this paper, we systematically investigate LLM vulnerabilities against prompt leakage for 10 closed- and open-source LLMs, across four domains. We design a unique threat model which leverages the LLM sycophancy effect and elevates the average attack success rate (ASR) from 17.7% to 86.2% in a multi-turn setting. Our standardized setup further allows dissecting leakage of specific prompt contents such as task instructions and knowledge documents. We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to defend against leakage attempts. We present different combination of defenses against our threat model, including a cost analysis. Our study highlights key takeaways for building secure LLM applications and provides directions for research in multi-turn LLM interactions.

Anthology ID:: 2024.emnlp-industry.94
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2024
Address:: Miami, Florida, US
Editors:: Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1255–1275
Language:
URL:: https://aclanthology.org/2024.emnlp-industry.94/
DOI:: 10.18653/v1/2024.emnlp-industry.94
Bibkey:
Cite (ACL):: Divyansh Agarwal, Alexander Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, and Chien-Sheng Wu. 2024. Prompt Leakage effect and mitigation strategies for multi-turn LLM Applications. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1255–1275, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):: Prompt Leakage effect and mitigation strategies for multi-turn LLM Applications (Agarwal et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-industry.94.pdf

PDF Cite Search Fix data