João P. C. Presa


2026

Semantic re-ranking architectures based on cross-encoders are essential for high-precision Information Retrieval (IR) in the legal domain, but they face a dilemma: their high computational latency renders large-scale applications challenging, particularly in resource-constrained environments. Traditional single-stage approaches force a choice between computational efficiency and ranking quality. This work presents an empirical evaluation of established cascade re-ranking architectures to optimize this balance through the adaptive application of off-the-shelf models of increasing complexity over progressively smaller sets of candidates. We validated the architecture on a corpus of 300,000 legal documents in Portuguese from the Court of Accounts of the State of Goiás (TCE-GO). Experiments demonstrate a 60.3% reduction in latency (from 11.75s to 4.66s per query) compared to the most precise single-stage baseline, with a marginal degradation of only 2 p.p. in R@avg and 0.0224 in MRR@avg. The results validate the semantic funnel as a computationally viable solution for semantic document-to-document search within the specific context of the TCE-GO repository, establishing a baseline for future transferability studies in broader Portuguese legal contexts.