Fernando M. Federson

2026

Biatron: A Parameter-Efficient Small Language Model for Brazilian Portuguese with Integrated Mathematical Reasoning
Daniel Fazzioni | Maria C. X. de Almeida | Anna P. V. L. B. Moreira | Anderson S. Soares | Sávio S. T. de Oliveira | Fernando M. Federson
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

The development of Small Language Models (SLMs) for Portuguese faces significant challenges in balancing parameter efficiency with specialized capabilities, particularly in mathematical reasoning domains where existing models demonstrate limited native competence. This work introduces the first model in the Biatron series, a 345-million-parameter language model specifically optimized for Brazilian Portuguese through strategic data curation rather than brute-force parameter scaling. Using a carefully designed 60-30-10 data mixture combining high-quality Portuguese text from GigaVerbo, chain-of-thought reasoning examples, and mathematical datasets, Biatron was trained on 300 billion tokens using the Megatron-LM framework, achieving 32% Model FLOP Utilization.The model attains an overall score of 0.245 (aggregate performance) on Portuguese-specific benchmarks, approaching within 1.6% of Tucano-630M’s performance while utilizing 45% fewer parameters. Most significantly, Biatron achieves 7.5% Pass@1 accuracy on mathematical reasoning tasks—more than doubling the performance of Tucano-2.4B (3.5%) despite being nearly seven times smaller. These results validate that strategic data mixing can rival parameter scaling for language model development, establishing a reproducible methodology for efficient AI development in resource constrained language contexts. To support reproducibility and further research, the final model weights, training logs, and intermediate checkpoints are publicly available.

pdf bib abs

Optimizing Efficiency in Multi-Stage Semantic Re-ranking Architectures
Artur M. A. Novais | Anna P. V. L. B. Moreira | Maria C. X. de Almeida | João P. C. Presa | Fernando M. Federson | Sávio S. T. de Oliveira
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

Semantic re-ranking architectures based on cross-encoders are essential for high-precision Information Retrieval (IR) in the legal domain, but they face a dilemma: their high computational latency renders large-scale applications challenging, particularly in resource-constrained environments. Traditional single-stage approaches force a choice between computational efficiency and ranking quality. This work presents an empirical evaluation of established cascade re-ranking architectures to optimize this balance through the adaptive application of off-the-shelf models of increasing complexity over progressively smaller sets of candidates. We validated the architecture on a corpus of 300,000 legal documents in Portuguese from the Court of Accounts of the State of Goiás (TCE-GO). Experiments demonstrate a 60.3% reduction in latency (from 11.75s to 4.66s per query) compared to the most precise single-stage baseline, with a marginal degradation of only 2 p.p. in R@avg and 0.0224 in MRR@avg. The results validate the semantic funnel as a computationally viable solution for semantic document-to-document search within the specific context of the TCE-GO repository, establishing a baseline for future transferability studies in broader Portuguese legal contexts.

Co-authors

João P. C. Presa 1

Anderson S. Soares 1

Venues

PROPOR2

Fix author