Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models

Zhining Liu; Tianyi Wang; Xiao Lin; Penghao Ouyang; Gaotang Li; Ze Yang; Hui Liu; Sumit Keswani; Vishwa Pardeshi; Huijun Zhao; Wei Fan; Hanghang Tong

Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models

Zhining Liu, Tianyi Wang, Xiao Lin, Penghao Ouyang, Gaotang Li, Ze Yang, Hui Liu, Sumit Keswani, Vishwa Pardeshi, Huijun Zhao, Wei Fan, Hanghang Tong

Abstract

Despite substantial efforts toward improving the moral alignment of Vision-Language Models (VLMs), it remains unclear whether their ethical judgments are stable in realistic settings. This work studies moral robustness in VLMs, defined as the ability to preserve moral judgments under textual and visual perturbations that do not alter the underlying moral context. We systematically probe VLMs with a diverse set of model-agnostic multimodal perturbations and find that their moral stances are highly fragile, frequently flipping under simple manipulations. Our analysis reveals systematic vulnerabilities across perturbation types, moral domains, and model scales, including a sycophancy trade-off where stronger instruction-following models are more susceptible to persuasion. We further show that lightweight inference-time interventions can partially restore moral stability. These results demonstrate that moral alignment alone is insufficient and that moral robustness is a necessary criterion for the responsible deployment of VLMs.

Anthology ID:: 2026.findings-acl.2079
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41890–41909
Language:
URL:: https://aclanthology.org/2026.findings-acl.2079/
DOI:
Bibkey:
Cite (ACL):: Zhining Liu, Tianyi Wang, Xiao Lin, Penghao Ouyang, Gaotang Li, Ze Yang, Hui Liu, Sumit Keswani, Vishwa Pardeshi, Huijun Zhao, Wei Fan, and Hanghang Tong. 2026. Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 41890–41909, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models (Liu et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.2079.pdf
Checklist:: 2026.findings-acl.2079.checklist.pdf

PDF Cite Search Checklist Fix data