FinTextQA: A Dataset for Long-form Financial Question Answering

Jian Chen; Peilin Zhou; Yining Hua; Loh Xin; Kehui Chen; Ziyuan Li; Bing Zhu; Junwei Liang

doi:10.18653/v1/2024.acl-long.328

FinTextQA: A Dataset for Long-form Financial Question Answering

Jian Chen, Peilin Zhou, Yining Hua, Loh Xin, Kehui Chen, Ziyuan Li, Bing Zhu, Junwei Liang

Abstract

Accurate evaluation of financial question answering (QA) systems necessitates a comprehensive dataset encompassing diverse question types and contexts. However, current financial QA datasets lack scope diversity and question complexity. This work introduces FinTextQA, a novel dataset for long-form question answering (LFQA) in finance. FinTextQA comprises 1,262 high-quality, source-attributed QA pairs extracted and selected from finance textbooks and government agency websites.Moreover, we developed a Retrieval-Augmented Generation (RAG)-based LFQA system, comprising an embedder, retriever, reranker, and generator. A multi-faceted evaluation approach, including human ranking, automatic metrics, and GPT-4 scoring, was employed to benchmark the performance of different LFQA system configurations under heightened noisy conditions. The results indicate that: (1) Among all compared generators, Baichuan2-7B competes closely with GPT-3.5-turbo in accuracy score; (2) The most effective system configuration on our dataset involved setting the embedder, retriever, reranker, and generator as Ada2, Automated Merged Retrieval, Bge-Reranker-Base, and Baichuan2-7B, respectively; (3) models are less susceptible to noise after the length of contexts reaching a specific threshold. The dataset is publicly available at: https://huggingface.co/datasets/GPS-Lab/FinTextQA.

Anthology ID:: 2024.acl-long.328
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6025–6047
Language:
URL:: https://aclanthology.org/2024.acl-long.328/
DOI:: 10.18653/v1/2024.acl-long.328
Bibkey:
Cite (ACL):: Jian Chen, Peilin Zhou, Yining Hua, Loh Xin, Kehui Chen, Ziyuan Li, Bing Zhu, and Junwei Liang. 2024. FinTextQA: A Dataset for Long-form Financial Question Answering. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6025–6047, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: FinTextQA: A Dataset for Long-form Financial Question Answering (Chen et al., ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-long.328.pdf

PDF Cite Search Fix data