Zike Ding


2026

Multimodal Large Language Models (MLLMs) often suffer from modality bias, where the model disproportionately relies on one modality while neglecting critical information from others. Existing debiasing methods via modality masking create biased responses by completely removing an entire modality, forming an extreme and static training environment. However, real-world multimodal bias often emerges under subtle perturbations (e.g., mild occlusion, noisy instructions), where both modalities are present but the model is tempted to rely on spurious shortcuts. We propose B-APO (Bias-Targeted Adversarial Preference Optimization), which casts debiasing as a bias-targeted min-max game: we generate hard negatives by applying small adversarial perturbations in the latent space to maximally induce language-vision-prior reliance, and then perform preference alignment to enlarge the margin between clean and adversarial responses. This encourages the model to anchor on true cross-modal evidence even under the most adversarial conditions. Extensive experiments on bias and hallucination benchmarks demonstrate that B-APO achieves superior debiasing performance while maintaining general capabilities.