Synthesizing question answering data from financial documents: An End-to-End Multi-Agent Approach

Chetan Harsha; Karmvir Singh Phogat; Sridhar Dasaratha; Shashishekar Ramakrishna

Synthesizing question answering data from financial documents: An End-to-End Multi-Agent Approach

Chetan Harsha, Karmvir Singh Phogat, Sridhar Dasaratha, Shashishekar Ramakrishna

Abstract

Answering complex questions that require numerical reasoning over financial documents is challenging due to the diverse and scatterednature of relevant information. While large language models (LLMs) excel at financial reasoning, their enterprise deployment is often limited by cost and latency. Small language models (SLMs) present a cost-effective alternative but need to be fine-tuned with high-quality, domain-specific question-answer (QA) data. Acquiring such data requires manual expert annotation, presenting a bottleneck to the wider application of SLMs.This work introduces a modular, scalable end-to-end agentic pipeline that extracts and selects relevant content from unstructured financial documents and then generates QA pairs from the selected content for SLM fine-tuning. Compared to the same models trained on previous manually generated data for the task, one of the models trained on our pipeline-produced synthetic data achieved competitive in-distribution performance, and all tested models demonstrated superior generalization. The framework thus demonstrates considerable potential to accelerate the deployment of smaller, cost-effective models by reducing manual data creation efforts.

Anthology ID:: 2026.eacl-industry.51
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Yevgen Matusevych, Gülşen Eryiğit, Nikolaos Aletras
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 669–687
Language:
URL:: https://aclanthology.org/2026.eacl-industry.51/
DOI:
Bibkey:
Cite (ACL):: Chetan Harsha, Karmvir Singh Phogat, Sridhar Dasaratha, and Shashishekar Ramakrishna. 2026. Synthesizing question answering data from financial documents: An End-to-End Multi-Agent Approach. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), pages 669–687, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Synthesizing question answering data from financial documents: An End-to-End Multi-Agent Approach (Harsha et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-industry.51.pdf

PDF Cite Search Fix data