Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering

Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa


Abstract
To explain the predicted answers and evaluate the reasoning abilities of models, several studies have utilized underlying reasoning (UR) tasks in multi-hop question answering (QA) datasets. However, it remains an open question as to how effective UR tasks are for the QA task when training models on both tasks in an end-to-end manner. In this study, we address this question by analyzing the effectiveness of UR tasks (including both sentence-level and entity-level tasks) in three aspects: (1) QA performance, (2) reasoning shortcuts, and (3) robustness. While the previous models have not been explicitly trained on an entity-level reasoning prediction task, we build a multi-task model that performs three tasks together: sentence-level supporting facts prediction, entity-level reasoning prediction, and answer prediction. Experimental results on 2WikiMultiHopQA and HotpotQA-small datasets reveal that (1) UR tasks can improve QA performance. Using four debiased datasets that are newly created, we demonstrate that (2) UR tasks are helpful in preventing reasoning shortcuts in the multi-hop QA task. However, we find that (3) UR tasks do not contribute to improving the robustness of the model on adversarial questions, such as sub-questions and inverted questions. We encourage future studies to investigate the effectiveness of entity-level reasoning in the form of natural language questions (e.g., sub-question forms).
Anthology ID:
2023.findings-eacl.87
Volume:
Findings of the Association for Computational Linguistics: EACL 2023
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1163–1180
Language:
URL:
https://aclanthology.org/2023.findings-eacl.87
DOI:
10.18653/v1/2023.findings-eacl.87
Bibkey:
Cite (ACL):
Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. 2023. Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1163–1180, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering (Ho et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-eacl.87.pdf
Video:
 https://aclanthology.org/2023.findings-eacl.87.mp4