Multi-Vector Attention Models for Deep Re-ranking

Giulio Zhou, Jacob Devlin


Abstract
Large-scale document retrieval systems often utilize two styles of neural network models which live at two different ends of the joint computation vs. accuracy spectrum. The first style is dual encoder (or two-tower) models, where the query and document representations are computed completely independently and combined with a simple dot product operation. The second style is cross-attention models, where the query and document features are concatenated in the input layer and all computation is based on the joint query-document representation. Dual encoder models are typically used for retrieval and deep re-ranking, while cross-attention models are typically used for shallow re-ranking. In this paper, we present a lightweight architecture that explores this joint cost vs. accuracy trade-off based on multi-vector attention (MVA). We thoroughly evaluate our method on the MS-MARCO passage retrieval dataset and show how to efficiently trade off retrieval accuracy with joint computation and offline document storage cost. We show that a highly compressed document representation and inexpensive joint computation can be achieved through a combination of learned pooling tokens and aggressive downprojection. Our code and model checkpoints are open-source and available on GitHub.
Anthology ID:
2021.emnlp-main.443
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5452–5456
Language:
URL:
https://aclanthology.org/2021.emnlp-main.443
DOI:
10.18653/v1/2021.emnlp-main.443
Bibkey:
Cite (ACL):
Giulio Zhou and Jacob Devlin. 2021. Multi-Vector Attention Models for Deep Re-ranking. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5452–5456, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Multi-Vector Attention Models for Deep Re-ranking (Zhou & Devlin, EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.443.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.443.mp4
Data
MS MARCO