Higor Moreira
2026
The PROPOR Ecosystem: Structure, Roles, and Evolution of Portuguese-Language NLP
Rafael O. Nunes | Gustavo L. Tamiosso | Pedro L. C. de Andrade | Matheus S. de Aguiar | Rafael P. de Gouveia | Higor Moreira | Bruno Tavares | Laura P. de Gouveia | Felipe S. F. Paula | Andre Spritzer | Hidelberg O. Albuquerque | Nádia F. F. da Silva | Ellen P. R. S. Pereira | Dennis G. Balreira | Joel L. Carbonera
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Rafael O. Nunes | Gustavo L. Tamiosso | Pedro L. C. de Andrade | Matheus S. de Aguiar | Rafael P. de Gouveia | Higor Moreira | Bruno Tavares | Laura P. de Gouveia | Felipe S. F. Paula | Andre Spritzer | Hidelberg O. Albuquerque | Nádia F. F. da Silva | Ellen P. R. S. Pereira | Dennis G. Balreira | Joel L. Carbonera
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
The PROPOR conference has been the main venue for Portuguese language Natural Language Processing (NLP) research for over two decades. This paper presents a longitudinal bibliometric analysis of PROPOR from 2003 to 2024, examining thematic evolution, community structure, and scientific impact. We identify a shift from speech-oriented research toward text-based tasks, alongside the sustained importance of resources and linguistic theory. The community exhibits a stable structure, with complementary leadership models centered on institutional hubs and brokerage roles. Scientific impact is highly concentrated, following a long tail distribution, and distinguishes between cumulative productivity-driven impact and rapidly accelerating citation uptake in recent editions. These findings characterize PROPOR as a resilient regional linguistic ecosystem evolving in dialogue with broader NLP paradigms.
Data Augmentation for Named Entity Recognition in Domain-Specific Scenarios in Portuguese
Higor Moreira | Patricia Ferreira da Silva | Luciana Bencke | Viviane Moreira
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Higor Moreira | Patricia Ferreira da Silva | Luciana Bencke | Viviane Moreira
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Named Entity Recognition (NER) is an important task of Natural Language Processing. Achieving good results in this task usually requires a large amount of labeled data to train models. This is especially difficult in domain-specific datasets and low-resourced languages. To mitigate the high cost of human-annotated data, data augmentation can be used. In this work, we evaluate Data Augmentation techniques for NER, focusing on domain-specific datasets in Portuguese.We employed augmentation techniques based on rules, back-translation, and large language models on four datasets of varying sizes to train Transformer-based NER models.The results showed that most techniques improved over the baseline, with the best results achieved using PP-LLM, SR, and MR.
2023
Team INF-UFRGS at SemEval-2023 Task 7: Supervised Contrastive Learning for Pair-level Sentence Classification and Evidence Retrieval
Abel Corrêa Dias | Filipe Dias | Higor Moreira | Viviane Moreira | João Luiz Comba
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Abel Corrêa Dias | Filipe Dias | Higor Moreira | Viviane Moreira | João Luiz Comba
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper describes the EvidenceSCL system submitted by our team (INF-UFRGS) to SemEval-2023 Task 7: Multi-Evidence Natural Language Inference for Clinical Trial Data (NLI4CT). NLI4CT is divided into two tasks, one for determining the inference relation between a pair of statements in clinical trials and a second for retrieving a set of supporting facts from the premises necessary to justify the label predicted in the first task. Our approach uses pair-level supervised contrastive learning to classify pairs of sentences. We trained EvidenceSCL on two datasets created from NLI4CT and additional data from other NLI datasets. We show that our approach can address both goals of NLI4CT, and although it reached an intermediate position, there is room for improvement in the technique.