Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering

Sohee Yang, Minjoon Seo


Abstract
In open-domain question answering (QA), retrieve-and-read mechanism has the inherent benefit of interpretability and the easiness of adding, removing, or editing knowledge compared to the parametric approaches of closed-book QA models. However, it is also known to suffer from its large storage footprint due to its document corpus and index. Here, we discuss several orthogonal strategies to drastically reduce the footprint of a retrieve-and-read open-domain QA system by up to 160x. Our results indicate that retrieve-and-read can be a viable option even in a highly constrained serving environment such as edge devices, as we show that it can achieve better accuracy than a purely parametric model with comparable docker-level system size.
Anthology ID:
2021.naacl-main.468
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5856–5865
Language:
URL:
https://aclanthology.org/2021.naacl-main.468
DOI:
10.18653/v1/2021.naacl-main.468
Bibkey:
Cite (ACL):
Sohee Yang and Minjoon Seo. 2021. Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5856–5865, Online. Association for Computational Linguistics.
Cite (Informal):
Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (Yang & Seo, NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.468.pdf
Video:
 https://aclanthology.org/2021.naacl-main.468.mp4
Code
 clovaai/minimal-rnr-qa
Data
Natural QuestionsTriviaQA