Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models

Shitian Zhao, Renrui Zhang, Xu Luo, Yan Wang, Shanghang Zhang, Peng Gao


Abstract
Model fusing has always been an important topic, especially in an era where large language models (LLM) and multi-modal language models (MLM) with different architectures, parameter sizes and training pipelines, are being created all the time. In this work, we propose a post-hoc framework, aiming at fusing heterogeneous models off-the-shell, which we call likelihood composition, and the basic idea is to compose multiple models’ likelihood distribution when doing a multi-choice visual-question-answering task. Here the core concept, likelihood, is actually the log-probability of the candidate answer. In likelihood composition, we introduce some basic operations: debias, highlight, majority-vote and ensemble. By combining (composing) these basic elements, we get the mixed composition methods: mix-composition. Through conducting comprehensive experiments on 9 VQA datasets and 10 MLMs, we prove the effectiveness of mix-composition compared with simple ensemble or majority-vote methods. In this framework, people can propose new basic composition methods and combine them to get the new mixed composition methods. We hope our proposed likelihood composition can provide a new perspective of fusing heterogeneous models and inspire the exploration under this framework.
Anthology ID:
2024.findings-emnlp.594
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10152–10163
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.594
DOI:
Bibkey:
Cite (ACL):
Shitian Zhao, Renrui Zhang, Xu Luo, Yan Wang, Shanghang Zhang, and Peng Gao. 2024. Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10152–10163, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models (Zhao et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.594.pdf