Hidelberg O. Albuquerque

Also published as: Hidelberg O. Albuquerque


2026

The PROPOR conference has been the main venue for Portuguese language Natural Language Processing (NLP) research for over two decades. This paper presents a longitudinal bibliometric analysis of PROPOR from 2003 to 2024, examining thematic evolution, community structure, and scientific impact. We identify a shift from speech-oriented research toward text-based tasks, alongside the sustained importance of resources and linguistic theory. The community exhibits a stable structure, with complementary leadership models centered on institutional hubs and brokerage roles. Scientific impact is highly concentrated, following a long tail distribution, and distinguishes between cumulative productivity-driven impact and rapidly accelerating citation uptake in recent editions. These findings characterize PROPOR as a resilient regional linguistic ecosystem evolving in dialogue with broader NLP paradigms.
The legal domain presents several challenges for Natural Language Processing (NLP), particularly due to its linguistic complexity and lack of public datasets. Named Entity Recognition (NER), a subarea of NLP, has been successfully used to extract useful knowledge from legal texts. Its widespread use is limited by the lack of legal text corpora. This paper introduces UlyssesLegalNER-Br, a comprehensive corpus of Brazilian legal documents for NER, covering bills, case laws and laws, including the first NER corpus based exclusively on Brazilian laws. This research expand the UlyssesNER-Br corpus, previously focused only on the Brazilian legislative domain. The proposed corpus has 560 public documents annotated using a hybrid approach, organized in 9 categories and 23 fine-grained types, experimentally evaluated with the CRF, BiLSTM, and BERTimbau architectures. The corpus was experimentally evaluated regarding predictive performance, computational cost and label-level results. The best micro F1 96.18% was achieved by BERTimbau on the unified corpus, providing a strong baseline for Brazilian legal NER. At the label level, six categories and seven types presented a F1-score above 95%, while the lowest were distributed in the interval 71-82%.

2024