Chain of Attack: Hide Your Intention through Multi-Turn Interrogation

Xikang Yang; Biyu Zhou; Xuehai Tang; Jizhong Han; Songlin Hu

doi:10.18653/v1/2025.findings-acl.514

Chain of Attack: Hide Your Intention through Multi-Turn Interrogation

Xikang Yang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu

Abstract

The latent knowledge of large language models (LLMs) contains harmful or unethical content, which introduces significant security risks upon their widespread deployment. Conducting jailbreak attacks on LLMs can proactively identify vulnerabilities to enhance their security measures. However, previous jailbreak attacks primarily focus on single-turn dialogue scenarios, leaving vulnerabilities in multi-turn dialogue contexts inadequately explored. This paper investigates the resilience of black-box LLMs in multi-turn jailbreak attack scenarios from a novel interrogation perspective. We propose an optimal interrogation principle to conceal the jailbreak intent and introduce a multi-turn attack chain generation strategy called CoA. By employing two effective interrogation strategies tailored for LLMs, coupled with an interrogation history record management mechanis, it achieves a significant optimization of the attack process. Our approach enables the iterative generation of attack chains, offering a powerful tool for LLM red team testing. Experimental results demonstrate that LLMs exhibit insufficient resistance under multi-turn interrogation, with our method shows more advantages(ASR, 83% vs 64%). This work offers new insights into improving the safety of LLMs.

Anthology ID:: 2025.findings-acl.514
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9881–9901
Language:
URL:: https://aclanthology.org/2025.findings-acl.514/
DOI:: 10.18653/v1/2025.findings-acl.514
Bibkey:
Cite (ACL):: Xikang Yang, Biyu Zhou, Xuehai Tang, Jizhong Han, and Songlin Hu. 2025. Chain of Attack: Hide Your Intention through Multi-Turn Interrogation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 9881–9901, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Chain of Attack: Hide Your Intention through Multi-Turn Interrogation (Yang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.514.pdf

PDF Cite Search Fix data