English as Defense Proxy: Mitigating Multilingual Jailbreak via Eliciting English Safety Knowledge

Zekai Zhang, Yiduo Guo, Jiuheng Lin, Shanghaoran Quan, Huishuai Zhang, Dongyan Zhao


Abstract
Large language models (LLMs) excel in many tasks, but their safety guarantees vary by language, e.g., responses in English tend to be safer than those in low-resource languages. This inconsistency creates a vulnerability, since an attacker can circumvent safety measures by using a less-supported language as an intermediary, even without fluency in that language. Traditional solutions rely on multilingual safety alignment, which demands vast, per-language datasets and introduces significant trade-offs between usefulness and safety (the so-called “alignment tax”). To overcome these limitations, we introduce English as Defense Proxy (E-Proxy), a unified approach that leverages English, usually the advantage language of LLMs, as a universal safety anchor. During multilingual training, E-Proxy uses English jailbreak prompts to extract the model’s existing safety knowledge, then applies simple language-mapping prompts (e.g., “Please answer in target language”) to transfer that knowledge across languages. Our analysis shows that formulating prompts in a high-resource language preserves the model’s utility, while enforcing responses in the target language significantly enhances safety. We evaluate E-Proxy on extensive benchmarks of both attack resistance and task performance. On the MultiJail benchmark, E-Proxy blocks over 99 % of jailbreak attempts while retaining 95 % of average task performance, all with a simply constructed multilingual alignment data.
Anthology ID:
2025.findings-emnlp.62
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1185–1196
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.62/
DOI:
Bibkey:
Cite (ACL):
Zekai Zhang, Yiduo Guo, Jiuheng Lin, Shanghaoran Quan, Huishuai Zhang, and Dongyan Zhao. 2025. English as Defense Proxy: Mitigating Multilingual Jailbreak via Eliciting English Safety Knowledge. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 1185–1196, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
English as Defense Proxy: Mitigating Multilingual Jailbreak via Eliciting English Safety Knowledge (Zhang et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.62.pdf
Checklist:
 2025.findings-emnlp.62.checklist.pdf