Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation

Zixian Guo, Yuxiang Wei, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng Zuo


Abstract
Parameter-efficient fine-tuning (PEFT) methods have provided an effective way for adapting large vision-language models to specific tasks or scenarios. Typically, they learn a very small scale of parameters for pre-trained models in a white-box formulation, which assumes model architectures to be known and parameters to be accessible. However, large models are often not open-source due to considerations of preventing abuse or commercial factors, hence posing a barrier to the deployment of white-box PEFT methods. To alleviate the dependence on model accessibility, we introduce collaborative black-box tuning (CBBT) for both textual prompt optimization and output feature adaptation for black-box models. Specifically, considering that the backpropagation gradients are blocked, we approximate the gradients of textual prompts by analyzing the predictions with perturbed prompts. Secondly, a lightweight adapter is deployed over the output feature of the inaccessible model, further facilitating the model adaptation process. Empowered with these designs, our CBBT is extensively evaluated on eleven downstream benchmarks and achieves remarkable improvements compared to existing black-box VL adaptation methods. Our code will be made publicly available.
Anthology ID:
2023.findings-emnlp.356
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5356–5368
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.356
DOI:
10.18653/v1/2023.findings-emnlp.356
Bibkey:
Cite (ACL):
Zixian Guo, Yuxiang Wei, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, and Wangmeng Zuo. 2023. Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5356–5368, Singapore. Association for Computational Linguistics.
Cite (Informal):
Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation (Guo et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.356.pdf