WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models

Yongan Yu; Qingchen Hu; Xianda Du; Jiayin Wang; Fengran Mo; Renée Sieber

doi:10.18653/v1/2025.findings-acl.207

WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models

Yongan Yu, Qingchen Hu, Xianda Du, Jiayin Wang, Fengran Mo, Renée Sieber

Abstract

Climate change adaptation requires the understanding of disruptive weather impacts on society, where large language models (LLMs) might be applicable. However, their effectiveness is under-explored due to the difficulty of high-quality corpus collection and the lack of available benchmarks. The climate-related events stored in regional newspapers record how communities adapted and recovered from disasters. However, the processing of the original corpus is non-trivial. In this study, we first develop a disruptive weather impact dataset with a four-stage well-crafted construction pipeline. Then, we propose WXImpactBench, the first benchmark for evaluating the capacity of LLMs on disruptive weather impacts. The benchmark involves two evaluation tasks, multi-label classification and ranking-based question answering. Extensive experiments on evaluating a set of LLMs provide first-hand analysis of the challenges in developing disruptive weather impact understanding and climate change adaptation systems. The constructed dataset and the code for the evaluation framework are available to help society protect against vulnerabilities from disasters.

Anthology ID:: 2025.findings-acl.207
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4016–4035
Language:
URL:: https://aclanthology.org/2025.findings-acl.207/
DOI:: 10.18653/v1/2025.findings-acl.207
Bibkey:
Cite (ACL):: Yongan Yu, Qingchen Hu, Xianda Du, Jiayin Wang, Fengran Mo, and Renée Sieber. 2025. WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 4016–4035, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models (Yu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.207.pdf

PDF Cite Search Fix data