Prompt Tuning for Unified Multimodal Pretrained Models

Hao Yang, Junyang Lin, An Yang, Peng Wang, Chang Zhou


Abstract
Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural language pretraining and even vision pretraining. The parameter-efficient prompt tuning methods that optimize soft embeddings while keeping the pretrained model frozen demonstrate advantages in low computation costs and almost lossless performance. In this work, we explore the transfer of prompt tuning to multimodal pretrained models. Specifically, we implement prompt tuning to a unified sequence-to-sequence pretrained model by adding a sequence of learnable embeddings to each layer and finetuning the pretrained model on downstream task with only the learnable embeddings being optimized. Experimental results on a series of multimodal understanding and generation tasks demonstrate that our method OFA-PT can achieve comparable performance with finetuning across a series of multimodal generation and understanding tasks. Additionally, it significantly outperforms the unified multimodal pretrained model with other parameter-efficient tuning methods, e.g., Adapter, BitFit. etc. Besides, in comparison with finetuned models, the prompt-tuned models demonstrate improved robustness against adversarial attacks. We further figure out that experimental factors, including prompt length, prompt depth, and reparameteratization, have great impacts on the model performance, and thus we empirically provide a recommendation for the setups of prompt tuning.
Anthology ID:
2023.findings-acl.27
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
402–416
Language:
URL:
https://aclanthology.org/2023.findings-acl.27
DOI:
10.18653/v1/2023.findings-acl.27
Bibkey:
Cite (ACL):
Hao Yang, Junyang Lin, An Yang, Peng Wang, and Chang Zhou. 2023. Prompt Tuning for Unified Multimodal Pretrained Models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 402–416, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Prompt Tuning for Unified Multimodal Pretrained Models (Yang et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.27.pdf