Isabel Trindade


2022

pdf bib
FeTaQA: Free-form Table Question Answering
Linyong Nan | Chiachun Hsieh | Ziming Mao | Xi Victoria Lin | Neha Verma | Rui Zhang | Wojciech Kryściński | Hailey Schoelkopf | Riley Kong | Xiangru Tang | Mutethia Mutuma | Ben Rosand | Isabel Trindade | Renusree Bandaru | Jacob Cunningham | Caiming Xiong | Dragomir Radev | Dragomir Radev
Transactions of the Association for Computational Linguistics, Volume 10

Existing table question answering datasets contain abundant factual questions that primarily evaluate a QA system’s comprehension of query and tabular data. However, restricted by their short-form answers, these datasets fail to include question–answer interactions that represent more advanced and naturally occurring information needs: questions that ask for reasoning and integration of information pieces retrieved from a structured knowledge source. To complement the existing datasets and to reveal the challenging nature of the table-based question answering task, we introduce FeTaQA, a new dataset with 10K Wikipedia-based table, question, free-form answer, supporting table cells pairs. FeTaQA is collected from noteworthy descriptions of Wikipedia tables that contain information people tend to seek; generation of these descriptions requires advanced processing that humans perform on a daily basis: Understand the question and table, retrieve, integrate, infer, and conduct text planning and surface realization to generate an answer. We provide two benchmark methods for the proposed task: a pipeline method based on semantic parsing-based QA systems and an end-to-end method based on large pretrained text generation models, and show that FeTaQA poses a challenge for both methods.