Extremely efficient online query encoding for dense retrieval

Nachshon Cohen, Yaron Fairstein, Guy Kushilevitz


Abstract
Existing dense retrieval systems utilize the same model architecture for encoding both the passages and the queries, even though queries are much shorter and simpler than passages. This leads to high latency of the query encoding, which is performed online and therefore might impact user experience. We show that combining a standard large passage encoder with a small efficient query encoder can provide significant latency drops with only a small decrease in quality. We offer a pretraining and training solution for multiple small query encoder architectures. Using a small transformer architecture we are able to decrease latency by up to ∼12×, while MRR@10 on the MS MARCO dev set only decreases from 38.2 to 36.2. If this solution does not reach the desired latency requirements, we propose an efficient RNN as the query encoder, which processes the query prefix incrementally and only infers the last word after the query is issued. This shortens latency by ∼38× with only a minor drop in quality, reaching 35.5 MRR@10 score.
Anthology ID:
2024.findings-naacl.4
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
43–50
Language:
URL:
https://aclanthology.org/2024.findings-naacl.4
DOI:
Bibkey:
Cite (ACL):
Nachshon Cohen, Yaron Fairstein, and Guy Kushilevitz. 2024. Extremely efficient online query encoding for dense retrieval. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 43–50, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Extremely efficient online query encoding for dense retrieval (Cohen et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.4.pdf
Copyright:
 2024.findings-naacl.4.copyright.pdf