Whose Voice, Whose Avatar? Gender Matching Bias in Multimodal AI Teammates

Kyusik Kim; Jaehoon Choi; Hyunwoo Yoo; Bongwon Suh

Whose Voice, Whose Avatar? Gender Matching Bias in Multimodal AI Teammates

Kyusik Kim, Jaehoon Choi, Hyunwoo Yoo, Bongwon Suh

Abstract

Multimodal Large Language Models (MLLMs) are increasingly deployed as social agents, yet their ability to integrate conflicting identity cues remains underexplored. We audit gender bias in ten recent MLLMs using a counterfactual cooperative gaming task that pairs synthetic voices with avatars of varying gender presentation and visual fidelity. Our analysis reveals distinct bias patterns that can occur independently: closed-source models (e.g., Gemini 2.5/3) exhibit a near-deterministic “voice-matching” bias that enforces binary alignment between voice and appearance, whereas open-weight models (e.g., Qwen-2.5-Omni-7B) show limited responsiveness to vocal cues and instead exhibit context-driven stereotypes, such as preferring male avatars in combat scenarios. We further find that reducing visual realism attenuates matching tendencies in some models. These findings demonstrate that multimodal fairness is not monolithic; models may appear unbiased on one dimension while enforcing strict identity congruence or role-based stereotypes on another. Code and data are available at https://github.com/halfhoon/whose-voice-whose-avatar.

Anthology ID:: 2026.findings-acl.2057
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41344–41367
Language:
URL:: https://aclanthology.org/2026.findings-acl.2057/
DOI:
Bibkey:
Cite (ACL):: Kyusik Kim, Jaehoon Choi, Hyunwoo Yoo, and Bongwon Suh. 2026. Whose Voice, Whose Avatar? Gender Matching Bias in Multimodal AI Teammates. In Findings of the Association for Computational Linguistics: ACL 2026, pages 41344–41367, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Whose Voice, Whose Avatar? Gender Matching Bias in Multimodal AI Teammates (Kim et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.2057.pdf
Checklist:: 2026.findings-acl.2057.checklist.pdf

PDF Cite Search Checklist Fix data