Revealing Gender Bias in Language Models through Fashion Image Captioning

Maria Villalba-Oses, Victoria Muñoz-Garcia, Juan Pablo Consuegra-Ayala


Abstract
Image captioning bridges computer vision and natural language processing but remains vulnerable to social biases. This study evaluates gender bias in ChatGPT, Copilot, and Grok by analyzing their descriptions of fashion-related images prompted without gender cues. We introduce a methodology combining gender annotation, stereotype classification, and a manually curated dataset. Results show that GPT-4o and Grok frequently assign gender and reinforce stereotypes, while Copilot more often generates neutral captions. Grok shows the lowest error rate but consistently assigns gender, even when cues are ambiguous. These findings highlight the need for bias-aware captioning approaches in multimodal systems.
Anthology ID:
2025.ranlp-1.155
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
1333–1340
Language:
URL:
https://aclanthology.org/2025.ranlp-1.155/
DOI:
Bibkey:
Cite (ACL):
Maria Villalba-Oses, Victoria Muñoz-Garcia, and Juan Pablo Consuegra-Ayala. 2025. Revealing Gender Bias in Language Models through Fashion Image Captioning. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 1333–1340, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Revealing Gender Bias in Language Models through Fashion Image Captioning (Villalba-Oses et al., RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.155.pdf