ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures

Tobias Schimanski, Jingwei Ni, Roberto Martín, Nicola Ranger, Markus Leippold


Abstract
To handle the vast amounts of qualitative data produced in corporate climate communication, stakeholders increasingly rely on Retrieval Augmented Generation (RAG) systems. However, a significant gap remains in evaluating domain-specific information retrieval – the basis for answer generation. To address this challenge, this work simulates the typical tasks of a sustainability analyst by examining 30 sustainability reports with 16 detailed climate-related questions. As a result, we obtain a dataset with over 8.5K unique question-source-answer pairs labeled by different levels of relevance. Furthermore, we develop a use case with the dataset to investigate the integration of expert knowledge into information retrieval with embeddings. Although we show that incorporating expert knowledge works, we also outline the critical limitations of embeddings in knowledge-intensive downstream domains like climate change communication.
Anthology ID:
2024.emnlp-main.969
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17509–17524
Language:
URL:
https://aclanthology.org/2024.emnlp-main.969
DOI:
Bibkey:
Cite (ACL):
Tobias Schimanski, Jingwei Ni, Roberto Martín, Nicola Ranger, and Markus Leippold. 2024. ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17509–17524, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures (Schimanski et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.969.pdf