BiQuAD: Towards QA based on deeper text understanding

Frank Grimm; Philipp Cimiano

doi:10.18653/v1/2021.starsem-1.10

BiQuAD: Towards QA based on deeper text understanding

Abstract

Recent question answering and machine reading benchmarks frequently reduce the task to one of pinpointing spans within a certain text passage that answers the given question. Typically, these systems are not required to actually understand the text on a deeper level that allows for more complex reasoning on the information contained. We introduce a new dataset called BiQuAD that requires deeper comprehension in order to answer questions in both extractive and deductive fashion. The dataset consist of 4,190 closed-domain texts and a total of 99,149 question-answer pairs. The texts are synthetically generated soccer match reports that verbalize the main events of each match. All texts are accompanied by a structured Datalog program that represents a (logical) model of its information. We show that state-of-the-art QA models do not perform well on the challenging long form contexts and reasoning requirements posed by the dataset. In particular, transformer based state-of-the-art models achieve F1-scores of only 39.0. We demonstrate how these synthetic datasets align structured knowledge with natural text and aid model introspection when approaching complex text understanding.

Anthology ID:: 2021.starsem-1.10
Volume:: Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics
Month:: August
Year:: 2021
Address:: Online
Editors:: Lun-Wei Ku, Vivi Nastase, Ivan Vulić
Venue:: *SEM
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 105–115
Language:
URL:: https://aclanthology.org/2021.starsem-1.10
DOI:: 10.18653/v1/2021.starsem-1.10
Bibkey:
Cite (ACL):: Frank Grimm and Philipp Cimiano. 2021. BiQuAD: Towards QA based on deeper text understanding. In Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics, pages 105–115, Online. Association for Computational Linguistics.
Cite (Informal):: BiQuAD: Towards QA based on deeper text understanding (Grimm & Cimiano, *SEM 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.starsem-1.10.pdf
Data: DROP, MedHop, NewsQA, OpenBookQA, QASC, RACE, TriviaQA, WikiHop

PDF Cite Search