CocaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval

Jiapeng Wang; Chengyu Wang; Xiaodan Wang; Jun Huang; Lianwen Jin

doi:10.18653/v1/2023.acl-industry.8

CocaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval

Jiapeng Wang, Chengyu Wang, Xiaodan Wang, Jun Huang, Lianwen Jin

Abstract

Large-scale pre-trained text-image models with dual-encoder architectures (such as CLIP) are typically adopted for various vision-language applications, including text-image retrieval. However, these models are still less practical on edge devices or for real-time situations, due to the substantial indexing and inference time and the large consumption of computational resources. Although knowledge distillation techniques have been widely utilized for uni-modal model compression, how to expand them to the situation when the numbers of modalities and teachers/students are doubled has been rarely studied. In this paper, we conduct comprehensive experiments on this topic and propose the fully-Connected knowledge interaction graph (Coca) technique for cross-modal pre-training distillation. Based on our findings, the resulting CocaCLIP achieves SOTA performances on the widely-used Flickr30K and MSCOCO benchmarks under the lightweight setting. An industry application of our method on an e-commercial platform further demonstrates the significant effectiveness of CocaCLIP.

Anthology ID:: 2023.acl-industry.8
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Sunayana Sitaram, Beata Beigman Klebanov, Jason D Williams
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 71–80
Language:
URL:: https://aclanthology.org/2023.acl-industry.8
DOI:: 10.18653/v1/2023.acl-industry.8
Bibkey:
Cite (ACL):: Jiapeng Wang, Chengyu Wang, Xiaodan Wang, Jun Huang, and Lianwen Jin. 2023. CocaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 71–80, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: CocaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval (Wang et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-industry.8.pdf

PDF Cite Search