Irina Rish

2024

pdf bib abs
Improving Adversarial Robustness in Vision-Language Models with Architecture and Prompt Design
Rishika Bhagwatkar | Shravan Nayak | Pouya Bashivan | Irina Rish
Findings of the Association for Computational Linguistics: EMNLP 2024

Vision-Language Models (VLMs) have seen a significant increase in both research interest and real-world applications across various domains, including healthcare, autonomous systems, and security. However, their growing prevalence demands higher reliability and safety including robustness to adversarial attacks. We systematically examine the possibility of incorporating adversarial robustness through various model design choices. We explore the effects of different vision encoders, the resolutions of vision encoders, and the size and type of language models. Additionally, we introduce novel, cost-effective approaches to enhance robustness through prompt engineering. By simply suggesting the possibility of adversarial perturbations or rephrasing questions, we demonstrate substantial improvements in model robustness against strong image-based attacks such as Auto-PGD. Our findings provide important guidelines for developing more robust VLMs, particularly for deployment in safety-critical environments where reliability and security are paramount. These insights are crucial for advancing the field of VLMs, ensuring they can be safely and effectively utilized in a wide range of applications.

Co-authors

Venues

findings1

Fix data