Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval

Haoliang Liu, Tan Yu, Ping Li


Abstract
By exploiting the cross-modal attention, cross-BERT methods have achieved state-of-the-art accuracy in cross-modal retrieval. Nevertheless, the heavy text-image interactions in the cross-BERT model are prohibitively slow for large-scale retrieval. Late-interaction methods trade off retrieval accuracy and efficiency by exploiting cross-modal interaction only in the late stage, attaining a satisfactory retrieval speed. In this work, we propose an inflating and shrinking approach to further boost the efficiency and accuracy of late-interaction methods. The inflating operation plugs several codes in the input of the encoder to exploit the text-image interactions more thoroughly for higher retrieval accuracy. Then the shrinking operation gradually reduces the text-image interactions through knowledge distilling for higher efficiency. Through an inflating operation followed by a shrinking operation, both efficiency and accuracy of a late-interaction model are boosted. Systematic experiments on public benchmarks demonstrate the effectiveness of our inflating and shrinking approach.
Anthology ID:
2021.emnlp-main.772
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9796–9809
Language:
URL:
https://aclanthology.org/2021.emnlp-main.772
DOI:
10.18653/v1/2021.emnlp-main.772
Bibkey:
Cite (ACL):
Haoliang Liu, Tan Yu, and Ping Li. 2021. Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9796–9809, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval (Liu et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.772.pdf
Software:
 2021.emnlp-main.772.Software.txt
Video:
 https://aclanthology.org/2021.emnlp-main.772.mp4
Data
Visual Genome