Relation-Guided Pre-Training for Open-Domain Question Answering

Ziniu Hu, Yizhou Sun, Kai-Wei Chang


Abstract
Answering complex open-domain questions requires understanding the latent relations between involving entities. However, we found that the existing QA datasets are extremely imbalanced in some types of relations, which hurts the generalization performance over questions with long-tail relations. To remedy this problem, in this paper, we propose a Relation-Guided Pre-Training (RGPT-QA) framework. We first generate a relational QA dataset covering a wide range of relations from both the Wikidata triplets and Wikipedia hyperlinks. We then pre-train a QA model to infer the latent relations from the question, and then conduct extractive QA to get the target answer entity. We demonstrate that by pre-training with propoed RGPT-QA techique, the popular open-domain QA model, Dense Passage Retriever (DPR), achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions. Particularly, we show that RGPT-QA improves significantly on questions with long-tail relations.
Anthology ID:
2021.findings-emnlp.292
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3431–3448
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.292
DOI:
10.18653/v1/2021.findings-emnlp.292
Bibkey:
Cite (ACL):
Ziniu Hu, Yizhou Sun, and Kai-Wei Chang. 2021. Relation-Guided Pre-Training for Open-Domain Question Answering. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3431–3448, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Relation-Guided Pre-Training for Open-Domain Question Answering (Hu et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.292.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.292.mp4
Data
Natural Questions