MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Masoud Monajatipoor; Liunian Harold Li; Mozhdeh Rouhsedaghat; Lin Yang; Kai-Wei Chang

doi:10.18653/v1/2023.acl-short.43

MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Masoud Monajatipoor, Liunian Harold Li, Mozhdeh Rouhsedaghat, Lin Yang, Kai-Wei Chang

Abstract

Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to the VL domain? Specifically, we first meta-trains a language model to perform in-context learning on NLP tasks (as in MetaICL); then we transfer this model to perform VL tasks by attaching a visual encoder. Our experiments suggest that indeed in-context learning ability can be transferred cross modalities: our model considerably improves the in-context learning capability on VL tasks and can even compensate for the size of the model significantly. On VQA, OK-VQA, and GQA, our method could outperform the baseline model while having ~20 times fewer parameters.

Anthology ID:: 2023.acl-short.43
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 495–508
Language:
URL:: https://aclanthology.org/2023.acl-short.43/
DOI:: 10.18653/v1/2023.acl-short.43
Bibkey:
Cite (ACL):: Masoud Monajatipoor, Liunian Harold Li, Mozhdeh Rouhsedaghat, Lin Yang, and Kai-Wei Chang. 2023. MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 495–508, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models (Monajatipoor et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-short.43.pdf
Video:: https://aclanthology.org/2023.acl-short.43.mp4

PDF Cite Search Video Fix data