Victoria Muñoz-Garcia


2025

Image captioning bridges computer vision and natural language processing but remains vulnerable to social biases. This study evaluates gender bias in ChatGPT, Copilot, and Grok by analyzing their descriptions of fashion-related images prompted without gender cues. We introduce a methodology combining gender annotation, stereotype classification, and a manually curated dataset. Results show that GPT-4o and Grok frequently assign gender and reinforce stereotypes, while Copilot more often generates neutral captions. Grok shows the lowest error rate but consistently assigns gender, even when cues are ambiguous. These findings highlight the need for bias-aware captioning approaches in multimodal systems.