Weiyuan Chen
2024
FinDVer: Explainable Claim Verification over Long and Hybrid-content Financial Documents
Yilun Zhao
|
Yitao Long
|
Tintin Jiang
|
Chengye Wang
|
Weiyuan Chen
|
Hongjun Liu
|
Xiangru Tang
|
Yiming Zhang
|
Chen Zhao
|
Arman Cohan
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We introduce FinDVer, a comprehensive benchmark specifically designed to evaluate the explainable claim verification capabilities of LLMs in the context of understanding and analyzing long, hybrid-content financial documents. FinDVer contains 4,000 expert-annotated examples across four subsets, each focusing on a type of scenario that frequently arises in real-world financial domains. We assess a broad spectrum of 25 LLMs under long-context and RAG settings. Our results show that even the current best-performing system (i.e., GPT-4o) significantly lags behind human experts. Our detailed findings and insights highlight the strengths and limitations of existing LLMs in this new task. We believe FinDVer can serve as a valuable benchmark for evaluating LLM capabilities in claim verification over complex, expert-domain documents.
Search
Co-authors
- Yilun Zhao 1
- Yitao Long 1
- Tintin Jiang 1
- Chengye Wang 1
- Hongjun Liu 1
- show all...