Patchwise Cooperative Game-based Interpretability Method for Large Vision-language Models

Yao Zhu; Yunjian Zhang; Zizhe Wang; Xiu Yan; Peng Sun; Xiangyang Ji

doi:10.1162/tacl_a_00756

Patchwise Cooperative Game-based Interpretability Method for Large Vision-language Models

Yao Zhu, Yunjian Zhang, Zizhe Wang, Xiu Yan, Peng Sun, Xiangyang Ji

Abstract

Amidst the rapid advancement of artificial intelligence, research on large vision-language models (LVLMs) has emerged as a pivotal area. However, understanding their internal mechanisms remains challenging due to the limitations of existing interpretability methods, especially regarding faithfulness and plausibility. To address this, we first construct a human response interpretability dataset that evaluates the plausibility of model explanations by comparing the attention regions between the model and humans when answering the same questions. We then propose a patchwise cooperative game-based interpretability method for LVLMs, which employs Shapley values to quantify the impact of individual image patches on generation likelihood and enhances computational efficiency through a single input approximation approach. Experimental results demonstrate our method’s faithfulness, plausibility, and robustness. Our method provides researchers with deeper insights into model behavior, allowing for an examination of the specific image regions each layer relies on during response generation, ultimately enhancing model reliability. Our code is available at https://github.com/ZY123-GOOD/Patchwise_Cooperative.

Anthology ID:: 2025.tacl-1.34
Volume:: Transactions of the Association for Computational Linguistics, Volume 13
Month:
Year:: 2025
Address:: Cambridge, MA
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 744–759
Language:
URL:: https://aclanthology.org/2025.tacl-1.34/
DOI:: 10.1162/tacl_a_00756
Bibkey:
Cite (ACL):: Yao Zhu, Yunjian Zhang, Zizhe Wang, Xiu Yan, Peng Sun, and Xiangyang Ji. 2025. Patchwise Cooperative Game-based Interpretability Method for Large Vision-language Models. Transactions of the Association for Computational Linguistics, 13:744–759.
Cite (Informal):: Patchwise Cooperative Game-based Interpretability Method for Large Vision-language Models (Zhu et al., TACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.tacl-1.34.pdf

PDF Cite Search Fix data