Improving the Robustness of Question Answering Systems to Question Paraphrasing

Wee Chung Gan, Hwee Tou Ng


Abstract
Despite the advancement of question answering (QA) systems and rapid improvements on held-out test sets, their generalizability is a topic of concern. We explore the robustness of QA models to question paraphrasing by creating two test sets consisting of paraphrased SQuAD questions. Paraphrased questions from the first test set are very similar to the original questions designed to test QA models’ over-sensitivity, while questions from the second test set are paraphrased using context words near an incorrect answer candidate in an attempt to confuse QA models. We show that both paraphrased test sets lead to significant decrease in performance on multiple state-of-the-art QA models. Using a neural paraphrasing model trained to generate multiple paraphrased questions for a given source question and a set of paraphrase suggestions, we propose a data augmentation approach that requires no human intervention to re-train the models for improved robustness to question paraphrasing.
Anthology ID:
P19-1610
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6065–6075
Language:
URL:
https://aclanthology.org/P19-1610
DOI:
10.18653/v1/P19-1610
Bibkey:
Cite (ACL):
Wee Chung Gan and Hwee Tou Ng. 2019. Improving the Robustness of Question Answering Systems to Question Paraphrasing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6065–6075, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Improving the Robustness of Question Answering Systems to Question Paraphrasing (Gan & Ng, ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1610.pdf
Code
 nusnlp/paraphrasing-squad
Data
SQuAD