Boosted Dense Retriever

Patrick Lewis, Barlas Oguz, Wenhan Xiong, Fabio Petroni, Scott Yih, Sebastian Riedel


Abstract
We propose DrBoost, a dense retrieval ensemble inspired by boosting. DrBoost is trained in stages: each component model is learned sequentially and specialized by focusing only on retrieval mistakes made by the current ensemble. The final representation is the concatenation of the output vectors of all the component models, making it a drop-in replacement for standard dense retrievers at test time. DrBoost enjoys several advantages compared to standard dense retrieval models. It produces representations which are 4x more compact, while delivering comparable retrieval results. It also performs surprisingly well under approximate search with coarse quantization, reducing latency and bandwidth needs by another 4x. In practice, this can make the difference between serving indices from disk versus from memory, paving the way for much cheaper deployments.
Anthology ID:
2022.naacl-main.226
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3102–3117
Language:
URL:
https://aclanthology.org/2022.naacl-main.226
DOI:
10.18653/v1/2022.naacl-main.226
Bibkey:
Cite (ACL):
Patrick Lewis, Barlas Oguz, Wenhan Xiong, Fabio Petroni, Scott Yih, and Sebastian Riedel. 2022. Boosted Dense Retriever. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3102–3117, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Boosted Dense Retriever (Lewis et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.226.pdf
Data
BEIRMS MARCONatural Questions