Uncovering Cultural Representation Disparities in Vision-Language Models

Ram Mohan Rao Kadiyala; Siddhant Gupta; Jebish Purbey; Srishti Yadav; Suman Debnath; Alejandro R. Salamanca; Desmond Elliott

doi:10.18653/v1/2025.findings-ijcnlp.131

Uncovering Cultural Representation Disparities in Vision-Language Models

Ram Mohan Rao Kadiyala, Siddhant Gupta, Jebish Purbey, Srishti Yadav, Suman Debnath, Alejandro R. Salamanca, Desmond Elliott

Abstract

Vision-Language Models (VLMs) have demonstrated impressive capabilities across a range of tasks, yet concerns about their potential biases persist. This work investigates the cultural biases in state-of-the-art VLMs by evaluating their performance on an image-based country identification task at the country level. Utilizing the geographically diverse Country211 (CITATION) dataset, we probe VLMs via open-ended questions, multiple-choice questions (MCQs), and include challenging multilingual and adversarial task settings. Our analysis aims to uncover disparities in model accuracy across different countries and question formats, providing insights into how training data distribution and evaluation methodologies may influence cultural biases in VLMs. The findings highlight significant variations in performance, suggesting that while VLMs possess considerable visual understanding, they inherit biases from their pre-training data and scale, which impact their ability to generalize uniformly across diverse global contexts.

Anthology ID:: 2025.findings-ijcnlp.131
Volume:: Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venue:: Findings
SIG:
Publisher:: The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:: 2087–2117
Language:
URL:: https://aclanthology.org/2025.findings-ijcnlp.131/
DOI:: 10.18653/v1/2025.findings-ijcnlp.131
Bibkey:
Cite (ACL):: Ram Mohan Rao Kadiyala, Siddhant Gupta, Jebish Purbey, Srishti Yadav, Suman Debnath, Alejandro R. Salamanca, and Desmond Elliott. 2025. Uncovering Cultural Representation Disparities in Vision-Language Models. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 2087–2117, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):: Uncovering Cultural Representation Disparities in Vision-Language Models (Kadiyala et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-ijcnlp.131.pdf

PDF Cite Search Fix data