M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs

Junwoo Ha; Hyunjun Kim; Sangyoon Yu; Haon Park; Ashkan Yousefpour; Yuna Park; Suhyun Kim

doi:10.18653/v1/2025.acl-long.805

M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs

Junwoo Ha, Hyunjun Kim, Sangyoon Yu, Haon Park, Ashkan Yousefpour, Yuna Park, Suhyun Kim

Abstract

We introduce a novel framework for consolidating multi-turn adversarial “jailbreak” prompts into single-turn queries, significantly reducing the manual overhead required for adversarial testing of large language models (LLMs). While multi-turn human jailbreaks have been shown to yield high attack success rates (ASRs), they demand considerable human effort and time. Our proposed Multi-turn-to-Single-turn (M2S) methods—Hyphenize, Numberize, and Pythonize—systematically reformat multi-turn dialogues into structured single-turn prompts. Despite eliminating iterative back-and-forth interactions, these reformatted prompts preserve and often enhance adversarial potency: in extensive evaluations on the Multi-turn Human Jailbreak (MHJ) dataset, M2S methods yield ASRs ranging from 70.6 % to 95.9 % across various state-of-the-art LLMs. Remarkably, our single-turn prompts outperform the original multi-turn attacks by up to 17.5 % in absolute ASR, while reducing token usage by more than half on average. Further analyses reveal that embedding malicious requests in enumerated or code-like structures exploits “contextual blindness,” undermining both native guardrails and external input-output safeguards. By consolidating multi-turn conversations into efficient single-turn prompts, our M2S framework provides a powerful tool for large-scale red-teaming and exposes critical vulnerabilities in contemporary LLM defenses. All code, data, and conversion prompts are available for reproducibility and further investigations: https://github.com/Junuha/M2S_DATA

Anthology ID:: 2025.acl-long.805
Original:: 2025.acl-long.805v1
Version 2:: 2025.acl-long.805v2
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16489–16507
Language:
URL:: https://aclanthology.org/2025.acl-long.805/
DOI:: 10.18653/v1/2025.acl-long.805
Bibkey:
Cite (ACL):: Junwoo Ha, Hyunjun Kim, Sangyoon Yu, Haon Park, Ashkan Yousefpour, Yuna Park, and Suhyun Kim. 2025. M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16489–16507, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs (Ha et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.805.pdf

PDF (v2) PDF (v1) Cite Search Fix data