Michael Backes

2024

pdf bib abs
Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models
Junjie Chu | Zeyang Sha | Michael Backes | Yang Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Significant advancements have recently been made in large language models, represented by GPT models.Users frequently have multi-round private conversations with cloud-hosted GPT models for task optimization.Yet, this operational paradigm introduces additional attack surfaces, particularly in custom GPTs and hijacked chat sessions.In this paper, we introduce a straightforward yet potent Conversation Reconstruction Attack.This attack targets the contents of previous conversations between GPT models and benign users, i.e., the benign users’ input contents during their interaction with GPT models.The adversary could induce GPT models to leak such contents by querying them with designed malicious prompts.Our comprehensive examination of privacy risks during the interactions with GPT models under this attack reveals GPT-4’s considerable resilience.We present two advanced attacks targeting improved reconstruction of past conversations, demonstrating significant privacy leakage across all models under these advanced techniques.Evaluating various defense mechanisms, we find them ineffective against these attacks.Our findings highlight the ease with which privacy can be compromised in interactions with GPT models, urging the community to safeguard against potential abuses of these models’ capabilities.

pdf bib
ModSCAN: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities
Yukun Jiang | Zheng Li | Xinyue Shen | Yugeng Liu | Michael Backes | Yang Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

pdf bib abs
The Death and Life of Great Prompts: Analyzing the Evolution of LLM Prompts from the Structural Perspective
Yihan Ma | Xinyue Shen | Yixin Wu | Boyang Zhang | Michael Backes | Yang Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Effective utilization of large language models (LLMs), such as ChatGPT, relies on the quality of input prompts. This paper explores prompt engineering, specifically focusing on the disparity between experimentally designed prompts and real-world “in-the-wild” prompts. We analyze 10,538 in-the-wild prompts collected from various platforms and develop a framework that decomposes the prompts into eight key components. Our analysis shows that and Requirement are the most prevalent two components. Roles specified in the prompts, along with their capabilities, have become increasingly varied over time, signifying a broader range of application scenarios for LLMs. However, from the response of GPT-4, there is a marginal improvement with a specified role, whereas leveraging less prevalent components such as Capability and Demonstration can result in a more satisfying response. Overall, our work sheds light on the essential components of in-the-wild prompts and the effectiveness of these components on the broader landscape of LLM prompt engineering, providing valuable guidelines for the LLM community to optimize high-quality prompts.

pdf bib abs
Composite Backdoor Attacks Against Large Language Models
Hai Huang | Zhengyu Zhao | Michael Backes | Yun Shen | Yang Zhang
Findings of the Association for Computational Linguistics: NAACL 2024

Large language models (LLMs) have demonstrated superior performance compared to previous methods on various tasks, and often serve as the foundation models for many researches and services. However, the untrustworthy third-party LLMs may covertly introduce vulnerabilities for downstream tasks. In this paper, we explore the vulnerability of LLMs through the lens of backdoor attacks. Different from existing backdoor attacks against LLMs, ours scatters multiple trigger keys in different prompt components. Such a Composite Backdoor Attack (CBA) is shown to be stealthier than implanting the same multiple trigger keys in only a single component. CBA ensures that the backdoor is activated only when all trigger keys appear. Our experiments demonstrate that CBA is effective in both natural language processing (NLP) and multimodal tasks. For instance, with 3% poisoning samples against the LLaMA-7B model on the Emotion dataset, our attack achieves a 100% Attack Success Rate (ASR) with a False Triggered Rate (FTR) below 2.06% and negligible model accuracy degradation. Our work highlights the necessity of increased security research on the trustworthiness of foundation LLMs.

Co-authors

Venues

emnlp3
findings1

Fix data