mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models

Jakub Muszyński; Paweł Pozorski; Maria Ganzha

mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models

Jakub Muszyński, Paweł Pozorski, Maria Ganzha

Abstract

We present mllm-shap, an open-sourcePython platform for researchers and ML practitioners that extends Shapley value (SV)explainability from text-only large languagemodels to multimodal LLMs (MLLMs) thatjointly process text and audio. Buildingon the token-level SV framework introducedby TokenSHAP, mllm-shap addresses threechallenges absent in the text-only setting:(1) modality-aware coalition masking thathandles the coexistence of text tokens anddense audio encoder frames within a single input, (2) multi-turn conversation tracking withper-token role and modality metadata, and(3) audio token grouping via phonetic alignment that reduces the coalition space by 10–50 times. The platform ships as a pip-installablepackage implementing five SV estimation strategies – including a Complementary Contributions estimator with Neyman-optimal allocation that outperforms Monte Carlo baselines – together with an interactive web GUIfor real-time attribution visualization. Toour knowledge, mllm-shap is the first publicly available framework for complete, reproducible SV-based explainability of text-audioMLLMs. The package is MIT-licensed withfull source code on GitHub and a demonstration video included as supplementary material.

Anthology ID:: 2026.acl-demo.38
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Greg Durrett, Ping Jian
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 387–396
Language:
URL:: https://aclanthology.org/2026.acl-demo.38/
DOI:
Bibkey:
Cite (ACL):: Jakub Muszyński, Paweł Pozorski, and Maria Ganzha. 2026. mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 387–396, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models (Muszyński et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-demo.38.pdf

PDF Cite Search Fix data