MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?

Muntasir Wahed, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Nirav Diwan, Gang Wang, Dilek Hakkani-Tür, Ismini Lourentzou


Abstract
Recent advancements in Large Language Models (LLMs) have significantly enhanced their code generation capabilities. However, their robustness against adversarial misuse, particularly through multi-turn malicious coding prompts, remains underexplored. In this work, we introduce code decomposition attacks, where a malicious coding task is broken down into a series of seemingly benign subtasks across multiple conversational turns to evade safety filters. To facilitate systematic evaluation, we introduce MOCHA, a large-scale benchmark designed to evaluate the robustness of code LLMs against both single-turn and multi-turn malicious prompts. Empirical results across open- and closed-source models reveal persistent vulnerabilities, especially under multi-turn scenarios. Fine-tuning on MOCHA improves rejection rates while preserving coding ability, and importantly, enhances robustness on external adversarial datasets with up to 32.4% increase in rejection rates without any additional supervision.
Anthology ID:
2025.findings-emnlp.1249
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22922–22948
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1249/
DOI:
Bibkey:
Cite (ACL):
Muntasir Wahed, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Nirav Diwan, Gang Wang, Dilek Hakkani-Tür, and Ismini Lourentzou. 2025. MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 22922–22948, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts? (Wahed et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1249.pdf
Checklist:
 2025.findings-emnlp.1249.checklist.pdf