Improving Adversarial Robustness in Vision-Language Models with Architecture and Prompt Design

Rishika Bhagwatkar, Shravan Nayak, Pouya Bashivan, Irina Rish


Abstract
Vision-Language Models (VLMs) have seen a significant increase in both research interest and real-world applications across various domains, including healthcare, autonomous systems, and security. However, their growing prevalence demands higher reliability and safety including robustness to adversarial attacks. We systematically examine the possibility of incorporating adversarial robustness through various model design choices. We explore the effects of different vision encoders, the resolutions of vision encoders, and the size and type of language models. Additionally, we introduce novel, cost-effective approaches to enhance robustness through prompt engineering. By simply suggesting the possibility of adversarial perturbations or rephrasing questions, we demonstrate substantial improvements in model robustness against strong image-based attacks such as Auto-PGD. Our findings provide important guidelines for developing more robust VLMs, particularly for deployment in safety-critical environments where reliability and security are paramount. These insights are crucial for advancing the field of VLMs, ensuring they can be safely and effectively utilized in a wide range of applications.
Anthology ID:
2024.findings-emnlp.990
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17003–17020
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.990
DOI:
Bibkey:
Cite (ACL):
Rishika Bhagwatkar, Shravan Nayak, Pouya Bashivan, and Irina Rish. 2024. Improving Adversarial Robustness in Vision-Language Models with Architecture and Prompt Design. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 17003–17020, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Improving Adversarial Robustness in Vision-Language Models with Architecture and Prompt Design (Bhagwatkar et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.990.pdf