FIRE: A Dataset for Financial Relation Extraction

Hassan Hamad; Abhinav Kumar Thakur; Nijil Kolleri; Sujith Pulikodan; Keith Chugg

doi:10.18653/v1/2024.findings-naacl.230

FIRE: A Dataset for Financial Relation Extraction

Hassan Hamad, Abhinav Kumar Thakur, Nijil Kolleri, Sujith Pulikodan, Keith Chugg

Abstract

This paper introduces FIRE (**FI**nancial **R**elation **E**xtraction), a sentence-level dataset of named entities and relations within the financial sector. Comprising 3,025 instances, the dataset encapsulates 13 named entity types along with 18 relation types. Sourced from public financial reports and financial news articles, FIRE captures a wide array of financial information about a business including, but not limited to, corporate structure, business model, revenue streams, and market activities such as acquisitions. The full dataset was labeled by a single annotator to minimize labeling noise. The labeling time for each sentence was recorded during the labeling process. We show how this feature, along with curriculum learning techniques, can be used to improved a model’s performance. The FIRE dataset is designed to serve as a valuable resource for training and evaluating machine learning algorithms in the domain of financial information extraction. The dataset and the code to reproduce our experimental results are available at https://github.com/hmhamad/FIRE. The repository for the labeling tool can be found at https://github.com/abhinav-kumar-thakur/relation-extraction-annotator.

Anthology ID:: 2024.findings-naacl.230
Volume:: Findings of the Association for Computational Linguistics: NAACL 2024
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3628–3642
Language:
URL:: https://aclanthology.org/2024.findings-naacl.230/
DOI:: 10.18653/v1/2024.findings-naacl.230
Bibkey:
Cite (ACL):: Hassan Hamad, Abhinav Kumar Thakur, Nijil Kolleri, Sujith Pulikodan, and Keith Chugg. 2024. FIRE: A Dataset for Financial Relation Extraction. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3628–3642, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: FIRE: A Dataset for Financial Relation Extraction (Hamad et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-naacl.230.pdf
Video:: https://aclanthology.org/2024.findings-naacl.230.mp4

PDF Cite Search Video Fix data