FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

Tiansheng Hu; Tongyan Hu; Liuyang Bai; Yilun Zhao; Arman Cohan; Chen Zhao

doi:10.18653/v1/2025.emnlp-main.512

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

Tiansheng Hu, Tongyan Hu, Liuyang Bai, Yilun Zhao, Arman Cohan, Chen Zhao

Abstract

Recent LLMs have demonstrated promising ability in solving finance related problems. However, applying LLMs in real-world finance application remains challenging due to its high risk and high stakes property. This paper introduces FinTrust, a comprehensive benchmark specifically designed for evaluating the trustworthiness of LLMs in finance applications. Our benchmark focuses on a wide range of alignment issues based on practical context and features fine-grained tasks for each dimension of trustworthiness evaluation. We assess eleven LLMs on FinTrust and find that proprietary models like o4-mini outperforms in most tasks such as safety while open-source models like DeepSeek-V3 have advantage in specific areas like industry-level fairness. For challenging task like fiduciary alignment and disclosure, all LLMs fall short, showing a significant gap in legal awareness. We believe that FinTrust can be a valuable benchmark for LLMs’ trustworthiness evaluation in finance domain.

Anthology ID:: 2025.emnlp-main.512
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10099–10128
Language:
URL:: https://aclanthology.org/2025.emnlp-main.512/
DOI:: 10.18653/v1/2025.emnlp-main.512
Bibkey:
Cite (ACL):: Tiansheng Hu, Tongyan Hu, Liuyang Bai, Yilun Zhao, Arman Cohan, and Chen Zhao. 2025. FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 10099–10128, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain (Hu et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.512.pdf
Checklist:: 2025.emnlp-main.512.checklist.pdf

PDF Cite Search Checklist Fix data