A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models

Ashutosh Sathe, Prachi Jain, Sunayana Sitaram


Abstract
Vision-language models (VLMs) have gained widespread adoption in both industry and academia. In this study, we propose a unified framework for systematically evaluating gender, race, and age biases in VLMs with respect to professions. Our evaluation encompasses all supported inference modes of the recent VLMs, including image-to-text, text-to-text, text-to-image, and image-to-image. We create a synthetic, high-quality dataset comprising text and images that intentionally obscure gender, race, and age distinctions across various professions. The dataset includes action-based descriptions of each profession and serves as a benchmark for evaluating societal biases in vision-language models (VLMs). In our benchmarking of popular vision-language models (VLMs), we observe that different input-output modalities result in distinct bias magnitudes and directions. We hope our work will help guide future progress in improving VLMs to learn socially unbiased representations. We will release our data and code.
Anthology ID:
2024.findings-emnlp.66
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1208–1249
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.66
DOI:
Bibkey:
Cite (ACL):
Ashutosh Sathe, Prachi Jain, and Sunayana Sitaram. 2024. A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1208–1249, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models (Sathe et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.66.pdf