Litian Zhang
2024
BaitAttack: Alleviating Intention Shift in Jailbreak Attacks via Adaptive Bait Crafting
Rui Pu
|
Chaozhuo Li
|
Rui Ha
|
Litian Zhang
|
Lirong Qiu
|
Xi Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Jailbreak attacks enable malicious queries to evade detection by LLMs. Existing attacks focus on meticulously constructing prompts to disguise harmful intentions. However, the incorporation of sophisticated disguising prompts may incur the challenge of “intention shift”. Intention shift occurs when the additional semantics within the prompt distract the LLMs, causing the responses to deviate significantly from the original harmful intentions. In this paper, we propose a novel component, “bait”, to alleviate the effects of intention shift. Bait comprises an initial response to the harmful query, prompting LLMs to rectify or supplement the knowledge within the bait. By furnishing rich semantics relevant to the query, the bait helps LLMs focus on the original intention. To conceal the harmful content within the bait, we further propose a novel attack paradigm, BaitAttack. BaitAttack adaptively generates necessary components to persuade targeted LLMs that they are engaging with a legitimate inquiry in a safe context. Our proposal is evaluated on a popular dataset, demonstrating state-of-the-art attack performance and an exceptional capability for mitigating intention shift. The implementation of BaitAttack is accessible at: https://anonymous.4open.science/r/BaitAttack-D1F5.
MDS: A Fine-Grained Dataset for Multi-Modal Dialogue Summarization
Zhipeng Liu
|
Xiaoming Zhang
|
Litian Zhang
|
Zelong Yu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Due to the explosion of various dialogue scenes, summarizing the dialogue into a short message has drawn much attention recently. In the multi-modal dialogue scene, people tend to use tone and body language to illustrate their intentions. While traditional dialogue summarization has predominantly focused on textual content, this approach may overlook vital visual and audio information essential for understanding multi-modal interactions. Recognizing the established field of multi-modal dialogue summarization, we develop a new multi-modal dialogue summarization dataset (MDS), which aims to enhance the variety and scope of data available for this research area. MDS provides a demanding testbed for multi-modal dialogue summarization. Subsequently, we conducted a comparative analysis of various summarization techniques on MDS and found that the existing methods tend to produce redundant and incoherent summaries. All of the models generate unfaithful facts to some degree, suggesting future research directions. MDS is available at https://github.com/R00kkie/MDS.
Search
Co-authors
- Rui Pu 1
- Chaozhuo Li 1
- Rui Ha 1
- Lirong Qiu 1
- Xi Zhang 1
- show all...