UlyssesLegalNER-Br: from Legislative to Legal, a comprehensive corpus of Brazilian legal documents for Named Entity Recognition

Hidelberg O. Albuquerque, Ellen Souza, Danilo C. G. Lucena, Héldon J. O. Albuquerque, Nádia F. F. da Silva, Márcio de S. Dias, Rafael O. Nunes, Adriano L. I. Oliveira, André C. P. L. F. de Carvalho


Abstract
The legal domain presents several challenges for Natural Language Processing (NLP), particularly due to its linguistic complexity and lack of public datasets. Named Entity Recognition (NER), a subarea of NLP, has been successfully used to extract useful knowledge from legal texts. Its widespread use is limited by the lack of legal text corpora. This paper introduces UlyssesLegalNER-Br, a comprehensive corpus of Brazilian legal documents for NER, covering bills, case laws and laws, including the first NER corpus based exclusively on Brazilian laws. This research expand the UlyssesNER-Br corpus, previously focused only on the Brazilian legislative domain. The proposed corpus has 560 public documents annotated using a hybrid approach, organized in 9 categories and 23 fine-grained types, experimentally evaluated with the CRF, BiLSTM, and BERTimbau architectures. The corpus was experimentally evaluated regarding predictive performance, computational cost and label-level results. The best micro F1 96.18% was achieved by BERTimbau on the unified corpus, providing a strong baseline for Brazilian legal NER. At the label level, six categories and seven types presented a F1-score above 95%, while the lowest were distributed in the interval 71-82%.
Anthology ID:
2026.propor-1.33
Volume:
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:
April
Year:
2026
Address:
Salvador, Brazil
Editors:
Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:
PROPOR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
331–341
Language:
URL:
https://aclanthology.org/2026.propor-1.33/
DOI:
Bibkey:
Cite (ACL):
Hidelberg O. Albuquerque, Ellen Souza, Danilo C. G. Lucena, Héldon J. O. Albuquerque, Nádia F. F. da Silva, Márcio de S. Dias, Rafael O. Nunes, Adriano L. I. Oliveira, and André C. P. L. F. de Carvalho. 2026. UlyssesLegalNER-Br: from Legislative to Legal, a comprehensive corpus of Brazilian legal documents for Named Entity Recognition. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 331–341, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):
UlyssesLegalNER-Br: from Legislative to Legal, a comprehensive corpus of Brazilian legal documents for Named Entity Recognition (Albuquerque et al., PROPOR 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.propor-1.33.pdf