Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering

Minghan Li, Ming Li, Kun Xiong, Jimmy Lin


Abstract
Multi-task dense retrieval models can be used to retrieve documents from a common corpus (e.g., Wikipedia) for different open-domain question-answering (QA) tasks. However, Karpukhin et al. (2020) shows that jointly learning different QA tasks with one dense model is not always beneficial due to corpus inconsistency. For example, SQuAD only focuses on a small set of Wikipedia articles while datasets like NQ and Trivia cover more entries, and joint training on their union can cause performance degradation. To solve this problem, we propose to train individual dense passage retrievers (DPR) for different tasks and aggregate their predictions during test time, where we use uncertainty estimation as weights to indicate how probable a specific query belongs to each expert’s expertise. Our method reaches state-of-the-art performance on 5 benchmark QA datasets, with up to 10% improvement in top-100 accuracy compared to a joint-training multi-task DPR on SQuAD. We also show that our method handles corpus inconsistency better than the joint-training DPR on a mixed subset of different QA datasets. Code and data are available at https://github.com/alexlimh/DPR_MUF.
Anthology ID:
2021.findings-emnlp.26
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
274–287
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.26
DOI:
10.18653/v1/2021.findings-emnlp.26
Bibkey:
Cite (ACL):
Minghan Li, Ming Li, Kun Xiong, and Jimmy Lin. 2021. Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 274–287, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering (Li et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.26.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.26.mp4
Code
 alexlimh/DPR_MUF
Data
Natural QuestionsSQuADTriviaQA