From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei


Abstract
The rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting both LLMs and MLLMs, highlighting recent advancements in evaluation benchmarks, attack techniques and defense strategies. Compared to the more advanced state of unimodal jailbreaking, multimodal domain remains underexplored. We summarize the limitations and potential research directions of multimodal jailbreaking, aiming to inspire future research and further enhance the robustness and security of MLLMs.
Anthology ID:
2024.emnlp-main.973
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17568–17582
Language:
URL:
https://aclanthology.org/2024.emnlp-main.973
DOI:
10.18653/v1/2024.emnlp-main.973
Bibkey:
Cite (ACL):
Siyuan Wang, Zhuohan Long, Zhihao Fan, and Zhongyu Wei. 2024. From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17568–17582, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking (Wang et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.973.pdf