BBRC: Brazilian Banking Regulation Corpora

Rafael Faria de Azevedo, Thiago Henrique Eduardo Muniz, Claudio Pimentel, Guilherme Jose de Assis Foureaux, Barbara Caldeira Macedo, Daniel de Lima Vasconcelos


Abstract
We present BBRC, a collection of 25 corpus of banking regulatory risk from different departments of Banco do Brasil (BB). These are individual corpus about investments, insurance, human resources, security, technology, treasury, loans, accounting, fraud, credit cards, payment methods, agribusiness, risks, etc. They were annotated in binary form by experts indicating whether each regulatory document contains regulatory risk that may require changes to products, processes, services, and channels of a bank department or not. The corpora in Portuguese contain documents from 26 Brazilian regulatory authorities in the financial sector. In total, there are 61,650 annotated documents, mostly between half and three pages long. The corpora belong to a Natural Language Processing (NLP) application that has been in production since 2020. In this work, we also performed binary classification benchmarks with some of the corpus. Experiments were carried out with different sampling techniques and in one of them we sought to solve an intraclass imbalance problem present in each corpus of the corpora. For the benchmarks, we used the following classifiers: Multinomial Naive Bayes, Random Forest, SVM, XGBoost, and BERTimbau (a version of BERT for Portuguese). The BBRC can be downloaded through a link in the article.
Anthology ID:
2024.finnlp-1.15
Volume:
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Chung-Chi Chen, Xiaomo Liu, Udo Hahn, Armineh Nourbakhsh, Zhiqiang Ma, Charese Smiley, Veronique Hoste, Sanjiv Ranjan Das, Manling Li, Mohammad Ghassemi, Hen-Hsen Huang, Hiroya Takamura, Hsin-Hsi Chen
Venues:
FinNLP | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
150–166
Language:
URL:
https://aclanthology.org/2024.finnlp-1.15
DOI:
Bibkey:
Cite (ACL):
Rafael Faria de Azevedo, Thiago Henrique Eduardo Muniz, Claudio Pimentel, Guilherme Jose de Assis Foureaux, Barbara Caldeira Macedo, and Daniel de Lima Vasconcelos. 2024. BBRC: Brazilian Banking Regulation Corpora. In Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing @ LREC-COLING 2024, pages 150–166, Torino, Italia. ELRA and ICCL.
Cite (Informal):
BBRC: Brazilian Banking Regulation Corpora (Faria de Azevedo et al., FinNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.finnlp-1.15.pdf