Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain

Gang Cheng; Haibo Jin; Wenbin Zhang; Haohan Wang; Jun Zhuang

Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain

Gang Cheng, Haibo Jin, Wenbin Zhang, Haohan Wang, Jun Zhuang

Abstract

Large Language Models (LLMs) are increasingly deployed in finance, where unsafe behavior can lead to serious regulatory risks. However, most red-teaming research focuses on overtly harmful content and overlooks attacks that appear legitimate on the surface yet induce regulatory-violating responses. We address this gap by introducing a controllable black-box multi-turn risk-concealed redteaming framework (CoRT) that progressively conceals surface-level risk while exploiting regulatory-violating behaviors. CoRT contains two key components: (i) a Risk Concealment Attacker (RCA) that generates multiturn prompts via iterative refinement, and (ii) a Risk Concealment Controller (RCC) that predicts a turn-level Risk Concealment Score (RCS) to steer RCA’s follow-up style. We also build a domain-specific benchmark, FinRisk-Bench, with 522 instructions spanning six financial risk categories. Experiments on nine widely used LLMs show that CoRT (RCA) achieves 93.19% average attack success rate (ASR), and CoRT (RCA+RCC) further improves the average ASR to 95.00%. Our code and FinRisk-Bench are available at https://github.com/gcheng128/CoRT.

Anthology ID:: 2026.acl-long.1903
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41005–41020
Language:
URL:: https://aclanthology.org/2026.acl-long.1903/
DOI:
Bibkey:
Cite (ACL):: Gang Cheng, Haibo Jin, Wenbin Zhang, Haohan Wang, and Jun Zhuang. 2026. Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 41005–41020, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain (Cheng et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1903.pdf
Checklist:: 2026.acl-long.1903.checklist.pdf

PDF Cite Search Checklist Fix data