Challenges in End-to-End Policy Extraction from Climate Action Plans

Nupoor Gandhi, Tom Corringham, Emma Strubell


Abstract
Gray policy literature such as climate action plans (CAPs) provide an information-rich resource with potential to inform analysis and decision-making. However, these corpora are currently underutilized due to the substantial manual effort and expertise required to sift through long and detailed documents. Automatically structuring relevant information using information extraction (IE) would be useful for assisting policy scientists in synthesizing vast gray policy corpora to identify relevant entities, concepts and themes. LLMs have demonstrated strong performance on IE tasks in the few-shot setting, but it is unclear whether these gains transfer to gray policy literature which differs significantly to traditional benchmark datasets in several aspects, such as format of information content, length of documents, and inconsistency of document structure. We perform a case study on end-to-end IE with California CAPs, inspecting the performance of state-of-the-art tools for: (1) extracting content from CAPs into structured markup segments; (2) few-shot IE with LLMs; and (3) the utility of extracted entities for downstream analyses. We identify challenges at several points of the end-to-end IE pipeline for CAPs, and we provide recommendations for open problems centered around representing rich non-textual elements, document structure, flexible annotation schemes, and global information. Tackling these challenges would make it possible to realize the potential of LLMs for IE with gray policy literature.
Anthology ID:
2024.climatenlp-1.12
Volume:
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Dominik Stammbach, Jingwei Ni, Tobias Schimanski, Kalyan Dutia, Alok Singh, Julia Bingler, Christophe Christiaen, Neetu Kushwaha, Veruska Muccione, Saeid A. Vaghefi, Markus Leippold
Venues:
ClimateNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
156–167
Language:
URL:
https://aclanthology.org/2024.climatenlp-1.12
DOI:
10.18653/v1/2024.climatenlp-1.12
Bibkey:
Cite (ACL):
Nupoor Gandhi, Tom Corringham, and Emma Strubell. 2024. Challenges in End-to-End Policy Extraction from Climate Action Plans. In Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024), pages 156–167, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Challenges in End-to-End Policy Extraction from Climate Action Plans (Gandhi et al., ClimateNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.climatenlp-1.12.pdf