B-APO: Bias-Targeted Adversarial Preference Optimization for Debiasing Multimodal Large Language Models

Pinlong Zhao, Zike Ding, Zengshu Ye, Zhou Zhaoting


Abstract
Multimodal Large Language Models (MLLMs) often suffer from modality bias, where the model disproportionately relies on one modality while neglecting critical information from others. Existing debiasing methods via modality masking create biased responses by completely removing an entire modality, forming an extreme and static training environment. However, real-world multimodal bias often emerges under subtle perturbations (e.g., mild occlusion, noisy instructions), where both modalities are present but the model is tempted to rely on spurious shortcuts. We propose B-APO (Bias-Targeted Adversarial Preference Optimization), which casts debiasing as a bias-targeted min-max game: we generate hard negatives by applying small adversarial perturbations in the latent space to maximally induce language-vision-prior reliance, and then perform preference alignment to enlarge the margin between clean and adversarial responses. This encourages the model to anchor on true cross-modal evidence even under the most adversarial conditions. Extensive experiments on bias and hallucination benchmarks demonstrate that B-APO achieves superior debiasing performance while maintaining general capabilities.
Anthology ID:
2026.findings-acl.1843
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36979–36999
Language:
URL:
https://aclanthology.org/2026.findings-acl.1843/
DOI:
Bibkey:
Cite (ACL):
Pinlong Zhao, Zike Ding, Zengshu Ye, and Zhou Zhaoting. 2026. B-APO: Bias-Targeted Adversarial Preference Optimization for Debiasing Multimodal Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 36979–36999, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
B-APO: Bias-Targeted Adversarial Preference Optimization for Debiasing Multimodal Large Language Models (Zhao et al., Findings 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.findings-acl.1843.pdf
Checklist:
 2026.findings-acl.1843.checklist.pdf