Hybrid Hierarchical Retrieval for Open-Domain Question Answering

Manoj Ghuhan Arivazhagan; Lan Liu; Peng Qi; Xinchi Chen; William Yang Wang; Zhiheng Huang

doi:10.18653/v1/2023.findings-acl.679

Hybrid Hierarchical Retrieval for Open-Domain Question Answering

Manoj Ghuhan Arivazhagan, Lan Liu, Peng Qi, Xinchi Chen, William Yang Wang, Zhiheng Huang

Abstract

Retrieval accuracy is crucial to the performance of open-domain question answering (ODQA) systems. Recent work has demonstrated that dense hierarchical retrieval (DHR), which retrieves document candidates first and then relevant passages from the refined document set, can significantly outperform the single stage dense passage retriever (DPR). While effective, this approach requires document structure information to learn document representation and is hard to adopt to other domains without this information. Additionally, the dense retrievers tend to generalize poorly on out-of-domain data comparing with sparse retrievers such as BM25. In this paper, we propose Hybrid Hierarchical Retrieval (HHR) to address the existing limitations. Instead of relying solely on dense retrievers, we can apply sparse retriever, dense retriever, and a combination of them in both stages of document and passage retrieval. We perform extensive experiments on ODQA benchmarks and observe that our framework not only brings in-domain gains, but also generalizes better to zero-shot TriviaQA and Web Questions datasets with an average of 4.69% improvement on recall@100 over DHR. We also offer practical insights to trade off between retrieval accuracy, latency, and storage cost. The code is available on github.

Anthology ID:: 2023.findings-acl.679
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10680–10689
Language:
URL:: https://aclanthology.org/2023.findings-acl.679
DOI:: 10.18653/v1/2023.findings-acl.679
Bibkey:
Cite (ACL):: Manoj Ghuhan Arivazhagan, Lan Liu, Peng Qi, Xinchi Chen, William Yang Wang, and Zhiheng Huang. 2023. Hybrid Hierarchical Retrieval for Open-Domain Question Answering. In Findings of the Association for Computational Linguistics: ACL 2023, pages 10680–10689, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Hybrid Hierarchical Retrieval for Open-Domain Question Answering (Arivazhagan et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.679.pdf
Video:: https://aclanthology.org/2023.findings-acl.679.mp4

PDF Cite Search Video