SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval

Tiancheng Zhao, Xiaopeng Lu, Kyusong Lee


Abstract
We introduce SPARTA, a novel neural retrieval method that shows great promise in performance, generalization, and interpretability for open-domain question answering. Unlike many neural ranking methods that use dense vector nearest neighbor search, SPARTA learns a sparse representation that can be efficiently implemented as an Inverted Index. The resulting representation enables scalable neural retrieval that does not require expensive approximate vector search and leads to better performance than its dense counterpart. We validated our approaches on 4 open-domain question answering (OpenQA) tasks and 11 retrieval question answering (ReQA) tasks. SPARTA achieves new state-of-the-art results across a variety of open-domain question answering tasks in both English and Chinese datasets, including open SQuAD, CMRC and etc. Analysis also confirms that the proposed method creates human interpretable representation and allows flexible control over the trade-off between performance and efficiency.
Anthology ID:
2021.naacl-main.47
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
565–575
Language:
URL:
https://aclanthology.org/2021.naacl-main.47
DOI:
10.18653/v1/2021.naacl-main.47
Bibkey:
Cite (ACL):
Tiancheng Zhao, Xiaopeng Lu, and Kyusong Lee. 2021. SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 565–575, Online. Association for Computational Linguistics.
Cite (Informal):
SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval (Zhao et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.47.pdf
Video:
 https://aclanthology.org/2021.naacl-main.47.mp4
Data
CMRCDRCDDROPDuoRCNatural QuestionsRACESQuAD