Mātṛkā: Multilingual Jailbreak Evaluation of Open-Source Large Language Models

Murali Emani; Kashyap Manjusha R

Mātṛkā: Multilingual Jailbreak Evaluation of Open-Source Large Language Models

Abstract

Artificial Intelligence (AI) and Large Language Models (LLMs) are increasingly integrated into high-stakes applications, yet their susceptibility to adversarial prompts poses significant security risks. In this work, we introduce Mātṛkā, a framework for systematically evaluating jailbreak vulnerabilities in open-source multilingual LLMs. Using the open-source dataset across nine sensitive categories, we constructed adversarial prompt sets that combine translation, mixed-language encoding, homoglyph signatures, numeric enforcement, and structural variations. Experiments were conducted on state-of-the-art open-source models from Llama, Qwen, GPT-OSS, Mistral, and Gemma families. Our findings highlight transferability of jailbreaks across multiple languages with varying success rates depending on attack design. We provide empirical insights, a novel taxonomy of multilingual jailbreak strategies, and recommendations for enhancing robustness in safety-critical environments.

Anthology ID:: 2025.bhasha-1.10
Volume:: Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Arnab Bhattacharya, Pawan Goyal, Saptarshi Ghosh, Kripabandhu Ghosh
Venues:: BHASHA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 117–121
Language:
URL:: https://aclanthology.org/2025.bhasha-1.10/
DOI:
Bibkey:
Cite (ACL):: Murali Emani and Kashyap Manjusha R. 2025. Mātṛkā: Multilingual Jailbreak Evaluation of Open-Source Large Language Models. In Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025), pages 117–121, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):: Mātṛkā: Multilingual Jailbreak Evaluation of Open-Source Large Language Models (Emani & R, BHASHA 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.bhasha-1.10.pdf

PDF Cite Search Fix data