Tom Corringham


2024

pdf bib
Challenges in End-to-End Policy Extraction from Climate Action Plans
Nupoor Gandhi | Tom Corringham | Emma Strubell
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)

Gray policy literature such as climate action plans (CAPs) provide an information-rich resource with potential to inform analysis and decision-making. However, these corpora are currently underutilized due to the substantial manual effort and expertise required to sift through long and detailed documents. Automatically structuring relevant information using information extraction (IE) would be useful for assisting policy scientists in synthesizing vast gray policy corpora to identify relevant entities, concepts and themes. LLMs have demonstrated strong performance on IE tasks in the few-shot setting, but it is unclear whether these gains transfer to gray policy literature which differs significantly to traditional benchmark datasets in several aspects, such as format of information content, length of documents, and inconsistency of document structure. We perform a case study on end-to-end IE with California CAPs, inspecting the performance of state-of-the-art tools for: (1) extracting content from CAPs into structured markup segments; (2) few-shot IE with LLMs; and (3) the utility of extracted entities for downstream analyses. We identify challenges at several points of the end-to-end IE pipeline for CAPs, and we provide recommendations for open problems centered around representing rich non-textual elements, document structure, flexible annotation schemes, and global information. Tackling these challenges would make it possible to realize the potential of LLMs for IE with gray policy literature.

pdf bib
Aligning Unstructured Paris Agreement Climate Plans with Sustainable Development Goals
Daniel Spokoyny | Janelle Cai | Tom Corringham | Taylor Berg-Kirkpatrick
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)

Aligning unstructured climate policy documents according to a particular classification taxonomy with little to no labeled examples is challenging and requires manual effort of climate policy researchers. In this work we examine whether large language models (LLMs) can act as an effective substitute or assist in the annotation process. Utilizing a large set of text spans from Paris Agreement Nationally Determined Contributions (NDCs) linked to United Nations Sustainable Development Goals (SDGs) and targets contained in the Climate Watch dataset from the World Resources Institute in combination with our own annotated data, we validate our approaches and establish a benchmark for model performance evaluation on this task. With our evaluation benchmarking we quantify the effectiveness of using zero-shot or few-shot prompted LLMs to align these documents.