How Well Do Multi-hop Reading Comprehension Models Understand Date Information?

Xanh Ho, Saku Sugawara, Akiko Aizawa


Abstract
Several multi-hop reading comprehension datasets have been proposed to resolve the issue of reasoning shortcuts by which questions can be answered without performing multi-hop reasoning. However, the ability of multi-hop models to perform step-by-step reasoning when finding an answer to a comparison question remains unclear. It is also unclear how questions about the internal reasoning process are useful for training and evaluating question-answering (QA) systems. To evaluate the model precisely in a hierarchical manner, we first propose a dataset, HieraDate, with three probing tasks in addition to the main question: extraction, reasoning, and robustness. Our dataset is created by enhancing two previous multi-hop datasets, HotpotQA and 2WikiMultiHopQA, focusing on multi-hop questions on date information that involve both comparison and numerical reasoning. We then evaluate the ability of existing models to understand date information. Our experimental results reveal that the multi-hop models do not have the ability to subtract two dates even when they perform well in date comparison and number subtraction tasks. Other results reveal that our probing questions can help to improve the performance of the models (e.g., by +10.3 F1) on the main QA task and our dataset can be used for data augmentation to improve the robustness of the models.
Anthology ID:
2022.aacl-short.58
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
November
Year:
2022
Address:
Online only
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
470–479
Language:
URL:
https://aclanthology.org/2022.aacl-short.58
DOI:
Bibkey:
Cite (ACL):
Xanh Ho, Saku Sugawara, and Akiko Aizawa. 2022. How Well Do Multi-hop Reading Comprehension Models Understand Date Information?. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 470–479, Online only. Association for Computational Linguistics.
Cite (Informal):
How Well Do Multi-hop Reading Comprehension Models Understand Date Information? (Ho et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.aacl-short.58.pdf