Ruixiang Tang


2024

pdf bib
Secure Your Model: An Effective Key Prompt Protection Mechanism for Large Language Models
Ruixiang Tang | Yu-Neng Chuang | Xuanting Cai | Mengnan Du | Xia Hu
Findings of the Association for Computational Linguistics: NAACL 2024

Large language models (LLMs) have notably revolutionized many domains within natural language processing due to their exceptional performance. Their security has become increasingly vital. This study is centered on protecting LLMs against unauthorized access and potential theft. We propose a simple yet effective protective measure wherein a unique key prompt is embedded within the LLM. This mechanism enables the model to respond only when presented with the correct key prompt; otherwise, LLMs will refuse to react to any input instructions. This key prompt protection offers a robust solution to prevent the unauthorized use of LLMs, as the model becomes unusable without the correct key. We evaluated the proposed protection on multiple LLMs and NLP tasks. Results demonstrate that our method can successfully protect the LLM without significantly impacting the model’s original function. Moreover, we demonstrate potential attacks that attempt to bypass the protection mechanism will adversely affect the model’s performance, further emphasizing the effectiveness of the proposed protection method.

2023

pdf bib
Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning
Ruixiang Tang | Dehan Kong | Longtao Huang | Hui Xue
Findings of the Association for Computational Linguistics: ACL 2023

Large language models (LLMs) have recently shown great potential for in-context learning, where LLMs learn a new task simply by conditioning on a few input-label pairs (prompts). Despite their potential, our understanding of the factors influencing end-task performance and the robustness of in-context learning remains limited. This paper aims to bridge this knowledge gap by investigating the reliance of LLMs on shortcuts or spurious correlations within prompts. Through comprehensive experiments on classification and extraction tasks, we reveal that LLMs are “lazy learners” that tend to exploit such shortcuts. Additionally, we uncover a surprising finding that larger models are more likely to utilize shortcuts in prompts during inference. Our findings provide a new perspective on evaluating robustness in in-context learning and pose new challenges for detecting and mitigating the use of shortcuts in prompts.

pdf bib
Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks
Ruixiang Tang | Gord Lueck | Rodolfo Quispe | Huseyin Inan | Janardhan Kulkarni | Xia Hu
Findings of the Association for Computational Linguistics: EMNLP 2023

Large language models have revolutionized the field of NLP by achieving state-of-the-art performance on various tasks. However, there is a concern that these models may disclose information in the training data. In this study, we focus on the summarization task and investigate the membership inference (MI) attack: given a sample and black-box access to a model’s API, it is possible to determine if the sample was part of the training data. We exploit text similarity and the model’s resistance to document modifications as potential MI signals and evaluate their effectiveness on widely used datasets. Our results demonstrate that summarization models are at risk of exposing data membership, even in cases where the reference summary is not available. Furthermore, we discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.