Tab-CQA: A Tabular Conversational Question Answering Dataset on Financial Reports

Chuang Liu, Junzhuo Li, Deyi Xiong


Abstract
Existing conversational question answering (CQA) datasets have been usually constructed from unstructured texts in English. In this paper, we propose Tab-CQA, a tabular CQA dataset created from Chinese financial reports that are extracted from listed companies in a wide range of different sectors in the past 30 years. From these reports, we select 2,463 tables, and manually generate 2,463 conversations with 35,494 QA pairs. Additionally, we select 4,578 tables, from which 4,578 conversations with 73,595 QA pairs are automatically created via a template-based method. With the manually- and automatically-generated conversations, Tab-CQA contains answerable and unanswerable questions. For the answerable questions, we further diversify them to cover a wide range of skills, e.g., table retrieval, fact checking, numerical reasoning, so as to accommodate real-world scenarios. We further propose two different tabular CQA models, a text-based model and an operation-based model, and evaluate them on Tab-CQA. Experiment results show that Tab-CQA is a very challenging dataset, where a huge performance gap exists between human and neural models. We will publicly release Tab-CQA as a benchmark testbed to promote further research on Chinese tabular CQA.
Anthology ID:
2023.acl-industry.20
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Sunayana Sitaram, Beata Beigman Klebanov, Jason D Williams
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
196–207
Language:
URL:
https://aclanthology.org/2023.acl-industry.20
DOI:
10.18653/v1/2023.acl-industry.20
Bibkey:
Cite (ACL):
Chuang Liu, Junzhuo Li, and Deyi Xiong. 2023. Tab-CQA: A Tabular Conversational Question Answering Dataset on Financial Reports. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 196–207, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Tab-CQA: A Tabular Conversational Question Answering Dataset on Financial Reports (Liu et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-industry.20.pdf