SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation

Qilong Wu; Xiaoneng Xiang; Huang Hejia; Xuan Wang; Yeo Wei Jie; Ranjan Satapathy; Ricardo Shirota Filho; Bharadwaj Veeravalli

doi:10.18653/v1/2025.findings-naacl.66

SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation

Qilong Wu, Xiaoneng Xiang, Huang Hejia, Xuan Wang, Yeo Wei Jie, Ranjan Satapathy, Ricardo Shirota Filho, Bharadwaj Veeravalli

Abstract

The rapid growth of the financial sector and the increasing focus on Environmental, Social, and Governance (ESG) considerations have created a pressing need for advanced natural language processing (NLP) tools. Despite recent advancements, there is still a notable absence of open-source Large Language Models (LLMs) that are proficient across both general finance and ESG domains, such as generating ESG reports. To address this gap, we introduce SusGen-30k, a high-quality, category-balanced dataset comprising seven financial NLP tasks. In addition, we propose TCFD-Bench, a benchmark designed to improve the evaluation of sustainability report generation. Our data-centric approach led to the development of a suite of models, SusGen-GPT, trained on the curated dataset. These models were evaluated across six adapted tasks and two off-the-shelf tasks, showing state-of-the-art performance, surpassing all other models except GPT-4. Remarkably, SusGen-GPT achieved an average score only 0.02 below GPT-4, despite using models with only 7-8B parameters compared to much larger GPT-4. This demonstrates the efficiency of our approach in delivering high performance with significantly fewer resources, addressing existing challenges and fostering further advancements in the financial and ESG research community.

Anthology ID:: 2025.findings-naacl.66
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1184–1203
Language:
URL:: https://aclanthology.org/2025.findings-naacl.66/
DOI:: 10.18653/v1/2025.findings-naacl.66
Bibkey:
Cite (ACL):: Qilong Wu, Xiaoneng Xiang, Huang Hejia, Xuan Wang, Yeo Wei Jie, Ranjan Satapathy, Ricardo Shirota Filho, and Bharadwaj Veeravalli. 2025. SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 1184–1203, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation (Wu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.66.pdf

PDF Cite Search Fix data