MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning

Constantin Eichenberg; Sidney Black; Samuel Weinbach; Letitia Parcalabescu; Anette Frank

doi:10.18653/v1/2022.findings-emnlp.179

MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning

Constantin Eichenberg, Sidney Black, Samuel Weinbach, Letitia Parcalabescu, Anette Frank

Abstract

Large-scale pretraining is fast becoming the norm in Vision-Language (VL) modeling. However, prevailing VL approaches are limited by the requirement for labeled data and the use of complex multi-step pretraining objectives. We present MAGMA - a simple method for augmenting generative language models with additional modalities using adapter-based finetuning. Building on Frozen, we train a series of VL models that autoregressively generate text from arbitrary combinations of visual and textual input. The pretraining is entirely end-to-end using a single language modeling objective, simplifying optimization compared to previous approaches. Importantly, the language model weights remain unchanged during training, allowing for transfer of encyclopedic knowledge and in-context learning abilities from language pretraining. MAGMA outperforms Frozen on open-ended generative tasks, achieving state of the art results on the OKVQA benchmark and competitive results on a range of other popular VL benchmarks, while pretraining on 0.2 % of the number of samples used to train SimVLM.

Anthology ID:: 2022.findings-emnlp.179
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2416–2428
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.179/
DOI:: 10.18653/v1/2022.findings-emnlp.179
Bibkey:
Cite (ACL):: Constantin Eichenberg, Sidney Black, Samuel Weinbach, Letitia Parcalabescu, and Anette Frank. 2022. MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2416–2428, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning (Eichenberg et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-emnlp.179.pdf

PDF Cite Search Fix data