MM-LLMs: Recent Advances in MultiModal Large Language Models

Duzhen Zhang; Yahan Yu; Jiahua Dong; Chenxing Li; Dan Su; Chenhui Chu; Dong Yu (于东)

MM-LLMs: Recent Advances in MultiModal Large Language Models

Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su, Chenhui Chu, Dong Yu

Abstract

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Initially, we outline general design formulations for model architecture and training pipeline. Subsequently, we introduce a taxonomy encompassing 126 MM-LLMs, each characterized by its specific formulations. Furthermore, we review the performance of selected MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Finally, we explore promising directions for MM-LLMs while concurrently maintaining a [real-time tracking website](https://mm-llms.github.io/) for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.

Anthology ID:: 2024.findings-acl.738
Volume:: Findings of the Association for Computational Linguistics ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand and virtual meeting
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12401–12430
Language:
URL:: https://aclanthology.org/2024.findings-acl.738
DOI:
Bibkey:
Cite (ACL):: Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su, Chenhui Chu, and Dong Yu. 2024. MM-LLMs: Recent Advances in MultiModal Large Language Models. In Findings of the Association for Computational Linguistics ACL 2024, pages 12401–12430, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: MM-LLMs: Recent Advances in MultiModal Large Language Models (Zhang et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.738.pdf

PDF Cite Search