Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval

Haoliang Liu; Tan Yu; Ping Li

doi:10.18653/v1/2021.emnlp-main.772

Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval

Abstract

By exploiting the cross-modal attention, cross-BERT methods have achieved state-of-the-art accuracy in cross-modal retrieval. Nevertheless, the heavy text-image interactions in the cross-BERT model are prohibitively slow for large-scale retrieval. Late-interaction methods trade off retrieval accuracy and efficiency by exploiting cross-modal interaction only in the late stage, attaining a satisfactory retrieval speed. In this work, we propose an inflating and shrinking approach to further boost the efficiency and accuracy of late-interaction methods. The inflating operation plugs several codes in the input of the encoder to exploit the text-image interactions more thoroughly for higher retrieval accuracy. Then the shrinking operation gradually reduces the text-image interactions through knowledge distilling for higher efficiency. Through an inflating operation followed by a shrinking operation, both efficiency and accuracy of a late-interaction model are boosted. Systematic experiments on public benchmarks demonstrate the effectiveness of our inflating and shrinking approach.

Anthology ID:: 2021.emnlp-main.772
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9796–9809
Language:
URL:: https://aclanthology.org/2021.emnlp-main.772
DOI:: 10.18653/v1/2021.emnlp-main.772
Bibkey:
Cite (ACL):: Haoliang Liu, Tan Yu, and Ping Li. 2021. Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9796–9809, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval (Liu et al., EMNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.emnlp-main.772.pdf
Software:: 2021.emnlp-main.772.Software.txt
Video:: https://aclanthology.org/2021.emnlp-main.772.mp4
Data: Visual Genome

PDF Cite Search Software Video