Informing climate risk analysis using textual information - A research agenda

Andreas Dimmelmeier, Hendrik Doll, Malte Schierholz, Emily Kormanyos, Maurice Fehr, Bolei Ma, Jacob Beck, Alexander Fraser, Frauke Kreuter


Abstract
We present a research agenda focused on efficiently extracting, assuring quality, and consolidating textual company sustainability information to address urgent climate change decision-making needs. Starting from the goal to create integrated FAIR (Findable, Accessible, Interoperable, Reusable) climate-related data, we identify research needs pertaining to the technical aspects of information extraction as well as to the design of the integrated sustainability datasets that we seek to compile. Regarding extraction, we leverage technological advancements, particularly in large language models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines, to unlock the underutilized potential of unstructured textual information contained in corporate sustainability reports. In applying these techniques, we review key challenges, which include the retrieval and extraction of CO2 emission values from PDF documents, especially from unstructured tables and graphs therein, and the validation of automatically extracted data through comparisons with human-annotated values. We also review how existing use cases and practices in climate risk analytics relate to choices of what textual information should be extracted and how it could be linked to existing structured data.
Anthology ID:
2024.climatenlp-1.2
Volume:
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Dominik Stammbach, Jingwei Ni, Tobias Schimanski, Kalyan Dutia, Alok Singh, Julia Bingler, Christophe Christiaen, Neetu Kushwaha, Veruska Muccione, Saeid A. Vaghefi, Markus Leippold
Venues:
ClimateNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–26
Language:
URL:
https://aclanthology.org/2024.climatenlp-1.2
DOI:
Bibkey:
Cite (ACL):
Andreas Dimmelmeier, Hendrik Doll, Malte Schierholz, Emily Kormanyos, Maurice Fehr, Bolei Ma, Jacob Beck, Alexander Fraser, and Frauke Kreuter. 2024. Informing climate risk analysis using textual information - A research agenda. In Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024), pages 12–26, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Informing climate risk analysis using textual information - A research agenda (Dimmelmeier et al., ClimateNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.climatenlp-1.2.pdf