César Brasil Sperb

2026

A RAG Chatbot with Incremental Context Retrieval based on Local LLMs for Hospital Documents
Murilo Vargas da Cunha | Marília Rosa Silveira | César Brasil Sperb | Larissa Astrogildo Freitas | Ulisses Brisolara Corrêa
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2

The adoption of LLMs in hospital environments demands solutions that ensure information security, computational efficiency, and rigorous control over sensitive institutional data. This work presents the development and evaluation of a chatbot based on RAG, using exclusively local LLMs, applied to internal documents of a university hospital in Portuguese, composed of Standard Operating Procedures and technical manuals. The methodology initially evaluates the quality of information retrieval through dense embedding models, measured by the Mean Reciprocal Rank (MRR) metric. Then, the generation stage is analyzed in two distinct scenarios: (i) RAG with fixed context, in which multiple chunks are provided simultaneously to the model, and (ii) Incremental page retrieval, in which chunks are sent sequentially according to the retrieval ranking. The generation assessment was conducted with four local LLMs — MedGemma3:27B, Gemma3:27B, Gpt-oss:20B, and Mistral Small 3.1 — using BERTScore as a quality metric. The results indicate that indiscriminate context increase in the fixed-context scenario degrades generation quality, even while increasing the probability of recovering the relevant chunk. In contrast, the incremental page retrieval technique showed improvements in BERTScore values, with the MedGemma3:27B model standing out with the best overall results. These findings demonstrate that adaptive context control is a critical factor in increasing the reliability and efficiency of RAG systems based on local LLMs in the healthcare domain.

pdf bib abs

Avaliação End-to-End de um Sistema RAG para Documentos Hospitalares em Português
Murilo Vargas da Cunha | Marília Rosa Silveira | César Brasil Sperb | Brenda Salenave Santana | Larissa Astrogildo Freitas | Ulisses Brisolara Corrêa
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

Este artigo avalia um sistema end-to-end de Geração Aumentada por Recuperação (RAG) para consulta a documentos hospitalares regulatórios em português. O estudo analisa o impacto da otimização de cada componente (recuperação, reclassificação e geração) em um cenário de recursos limitados. A metodologia combinou a criação de um dataset híbrido (sintético e validado por especialistas) com avaliações quantitativas utilizando métricas como MRR, NDCG@10 e BERTScore. Os resultados demonstram que o modelo de embedding intfloat/multilingual-e5-small apresentou a maior robustez, com taxa de falha de apenas 1,4% na recuperação. Na etapa de reclassificação, o método RRF destacou-se pelo equilíbrio entre custo computacional e desempenho. Conclui-se que a arquitetura otimizada, integrando esses componentes ao gerador Gemini 2.5 Flash, oferece uma solução eficiente e precisa para suporte à decisão em ambientes hospitalares.

Co-authors

Venues

PROPOR2

Fix author