José Roberto Homeli Silva

2026

CURUPIRA: Clever guard for harm and linguistic prompt mitigation in Brazilian Portuguese
Rogério Sousa | William Alberto Cruz-Castañeda | José Roberto Homeli Silva | Marcellus Amadeus
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

The safe deployment of Large Language Models remains challenging in multilingual settings, particularly when models are exposed to adversarial or malicious prompts in underrepresented languages. In this work, we present Curupira, a Brazilian Portuguese-language guard model designed to mitigate harmful prompt exploitation. To do this, we establish a three steps methodology that involves adaptation, data generation, and fine-tuning. We also evaluate our model with two state-of-the-art open guardrail architectures. The results show that targeted fine-tuning leads to consistent improvements in safety classification for Portuguese prompts, with favorable efficiency–performance trade-offs for compact models and limited degradation in cross-lingual evaluation.

Co-authors

Venues

PROPOR1

Fix author