Unsupervised Multi-hop Question Answering by Question Generation

Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang


Abstract
Obtaining training data for multi-hop question answering (QA) is time-consuming and resource-intensive. We explore the possibility to train a well-performed multi-hop QA model without referencing any human-labeled multi-hop question-answer pairs, i.e., unsupervised multi-hop QA. We propose MQA-QG, an unsupervised framework that can generate human-like multi-hop training data from both homogeneous and heterogeneous data sources. MQA-QG generates questions by first selecting/generating relevant information from each data source and then integrating the multiple information to form a multi-hop question. Using only generated training data, we can train a competent multi-hop QA which achieves 61% and 83% of the supervised learning performance for the HybridQA and the HotpotQA dataset, respectively. We also show that pretraining the QA system with the generated data would greatly reduce the demand for human-annotated training data. Our codes are publicly available at https://github.com/teacherpeterpan/Unsupervised-Multi-hop-QA.
Anthology ID:
2021.naacl-main.469
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5866–5880
Language:
URL:
https://aclanthology.org/2021.naacl-main.469
DOI:
10.18653/v1/2021.naacl-main.469
Bibkey:
Cite (ACL):
Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, and William Yang Wang. 2021. Unsupervised Multi-hop Question Answering by Question Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5866–5880, Online. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Multi-hop Question Answering by Question Generation (Pan et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.469.pdf
Video:
 https://aclanthology.org/2021.naacl-main.469.mp4
Code
 teacherpeterpan/Unsupervised-Multi-hop-QA
Data
HotpotQAHybridQA