Mingxiao Song
2024
CoCoHD: Congress Committee Hearing Dataset
Arnav Hiray
|
Yunsong Liu
|
Mingxiao Song
|
Agam Shah
|
Sudheer Chava
Findings of the Association for Computational Linguistics: EMNLP 2024
U.S. congressional hearings significantly influence the national economy and social fabric, impacting individual lives. Despite their importance, there is a lack of comprehensive datasets for analyzing these discourses. To address this, we propose the **Co**ngress **Co**mmittee **H**earing **D**ataset (CoCoHD), covering hearings from 1997 to 2024 across 86 committees, with 32,697 records. This dataset enables researchers to study policy language on critical issues like healthcare, LGBTQ+ rights, and climate justice. We demonstrate its potential with a case study on 1,000 energy-related sentences, analyzing the Energy and Commerce Committee’s stance on fossil fuel consumption. By fine-tuning pre-trained language models, we create energy-relevant measures for each hearing. Our market analysis shows that natural language analysis using CoCoHD can predict and highlight trends in the energy sector.
Search