Subhendu Khatuya


2023

pdf bib
Financial Numeric Extreme Labelling: A dataset and benchmarking
Soumya Sharma | Subhendu Khatuya | Manjunath Hegde | Afreen Shaikh | Koustuv Dasgupta | Pawan Goyal | Niloy Ganguly
Findings of the Association for Computational Linguistics: ACL 2023

The U.S. Securities and Exchange Commission (SEC) mandates all public companies to file periodic financial statements that should contain numerals annotated with a particular label from a taxonomy. In this paper, we formulate the task of automating the assignment of a label to a particular numeral span in a sentence from an extremely large label set. Towards this task, we release a dataset, Financial Numeric Extreme Labelling (FNXL), annotated with 2,794 labels. We benchmark the performance of the FNXL dataset by formulating the task as (a) a sequence labelling problem and (b) a pipeline with span extraction followed by Extreme Classification. Although the two approaches perform comparably, the pipeline solution provides a slight edge for the least frequent labels.