Artur M. A. Novais

2026

Optimizing Efficiency in Multi-Stage Semantic Re-ranking Architectures
Artur M. A. Novais | Anna P. V. L. B. Moreira | Maria C. X. de Almeida | João P. C. Presa | Fernando M. Federson | Sávio S. T. de Oliveira
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

Semantic re-ranking architectures based on cross-encoders are essential for high-precision Information Retrieval (IR) in the legal domain, but they face a dilemma: their high computational latency renders large-scale applications challenging, particularly in resource-constrained environments. Traditional single-stage approaches force a choice between computational efficiency and ranking quality. This work presents an empirical evaluation of established cascade re-ranking architectures to optimize this balance through the adaptive application of off-the-shelf models of increasing complexity over progressively smaller sets of candidates. We validated the architecture on a corpus of 300,000 legal documents in Portuguese from the Court of Accounts of the State of Goiás (TCE-GO). Experiments demonstrate a 60.3% reduction in latency (from 11.75s to 4.66s per query) compared to the most precise single-stage baseline, with a marginal degradation of only 2 p.p. in R@avg and 0.0224 in MRR@avg. The results validate the semantic funnel as a computationally viable solution for semantic document-to-document search within the specific context of the TCE-GO repository, establishing a baseline for future transferability studies in broader Portuguese legal contexts.

2025

pdf bib abs

AKCIT at SemEval-2025 Task 11: Investigating Data Quality in Portuguese Emotion Recognition
Iago A. Brito | Fernanda B. Färber | Julia S. Dollis | Daniel M. Pedrozo | Artur M. A. Novais | Diogo F. C. Silva | Arlindo R. Galvão Filho
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper investigates the impact of data quality and processing strategies on emotion recognition in Brazilian Portuguese (PTBR) texts. We focus on data distribution, linguistic context, and augmentation techniques such as translation and synthetic data generation. To evaluate these aspects, we conduct experiments on the PTBR portion of the BRIGHTER dataset, a manually curated multilingual dataset containing nearly 100,000 samples, of which 4,552 are in PTBR. Our study encompasses both multi-label emotion detection (presence/absence classification) and emotion intensity prediction (0 to 3 scale), following the SemEval 2025 Track 11 setup. Results demonstrate that emotion intensity labels enhance model performance after discretization, and that smaller multilingual models can outperform larger ones in low-resource settings. Our official submission ranked 6th, but further refinements improved our ranking to 3rd, trailing the top submission by only 0.047, reinforcing the significance of a data-centric approach in emotion recognition.

Co-authors

Arlindo R. Galvão Filho 1

Anna P. V. L. B. Moreira 1

Sávio S. T. de Oliveira 1

Daniel M. Pedrozo 1

João P. C. Presa 1

Diogo F. C. Silva 1

Venues

Fix author