Willgnner Ferreira Santos

2026

LexIris-pt and LexBert-pt: Specialized Sentence Embeddings for Legal Similarity in Brazilian Portuguese
Willgnner Ferreira Santos | João Gabriel Grandotto Viana | Antônio Pires de Castro Júnior | Fernando Ribeiro Trindade | Nádia Félix Felipe da Silva
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

This work presents and evaluates two specialized sentence embedding models for the Portuguese legal domain, LexIris-pt and LexBert-pt, obtained through supervised fine-tuning of BERT-based models using pairs of initial petitions. We propose a comparative evaluation protocol along three fronts: (i) zero-shot inference with pretrained embeddings, (ii) supervised fine-tuning on these pairs, and (iii) vector retrieval with incremental clustering over a corpus of 20,000 initial petitions. The results show that fine-tuning consistently increases correlations with reference scores and improves performance in vector retrieval; additionally, the vector retrieval stage indicates that the metric configured in the index (cosine similarity or inner product) can change the granularity of the partitioning under a fixed threshold, reinforcing the need for joint calibration among the encoder, metric and threshold. After auditing by specialists from the partner institution, LexIris-pt and LexBert-pt were operationally adopted to support the screening and organization of repetitive claims and predatory litigation.

pdf bib abs

Multi-Agent Architecture with RAG and Dynamic Context Windows for Text-to-SQL Optimization
Willgnner Ferreira Santos | Paulo Victor dos Santos | Marcella Scoczynski Ribeiro Martins | Larissa Freire Lekakis | Frederico Lemes Rosa | Bruno Matheus Costa | Miguel Alves Pereira Filho | Isabella Alves Montalvão
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

Natural language interfaces supported by LLMs have been used to translate user questions into SQL queries, but sending the complete database schema in each prompt entails high token consumption and computational cost, especially in corporate databases with hundreds of tables. This work presents a multi-agent Text-to-SQL architecture with dynamic context windows, which combines RAG and metadata dictionaries to select, at query time, only the relevant tables and columns. In a case study with Firebird enterprise databases, the approach reduces by an average of 84.4% the number of processed tokens, resulting in more efficient queries without loss of quality, thereby contributing to the democratization of access to corporate databases.

Co-authors

Marcella Scoczynski Ribeiro Martins 1

Isabella Alves Montalvão 1

Frederico Lemes Rosa 1

Paulo Victor dos Santos 1

Fernando Ribeiro Trindade 1

João Gabriel Grandotto Viana 1

Venues

PROPOR2

Fix author