Eduardo Luz
2026
A Multitask Transformer for Offensive Language Detection and Target Identification in HateBR
Guilherme Silva | Pedro Silva | Matheus Peixoto | Gladston Moreira | Eduardo Luz
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Guilherme Silva | Pedro Silva | Matheus Peixoto | Gladston Moreira | Eduardo Luz
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Hate speech detection is often treated as a binary task, ignoring the hierarchical nature of toxicity, such as severity levels and specific target groups. This work presents a Multitask Learning (MTL) approach for the HateBR dataset, utilizing a shared BERTimbau encoder to simultaneously predict binary offensiveness, ordinal severity, and hate speech targets. Our experiments demonstrate that the MTL architecture outperforms Single-Task baselines on the primary offensive detection task, increasing the Matthews Correlation Coefficient from 0.80 to 0.82. Beyond predictive performance, we show that joint training implicitly enforces hierarchical sanity: the unified model yields a 0% target-inconsistency rate (i.e., no cases where a comment is predicted Non-offensive while still assigned a hate target). However, we observe negative transfer in the fine-grained multilabel target task (Micro-F1 drops from 0.59 to 0.42), highlighting a trade-off between logical consistency and target attribution under extreme imbalance.
Lost in Quantization: Activation Outliers Explain Language-Specific FP8 Sensitivity in Llama-3
Guilherme Silva | Pedro Silva | Matheus Peixoto | Gladston Moreira | Eduardo Luz
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Guilherme Silva | Pedro Silva | Matheus Peixoto | Gladston Moreira | Eduardo Luz
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Quantization is key for efficient LLM inference, but its language-specific effects are understudied. We compare INT8 and FP8 (E4M3) quantization for Meta-Llama-3-8B on English and Brazilian Portuguese (PT-BR). INT8 with outlier handling preserves perplexity in both languages, while naive FP8 casting degrades English far more than PT-BR (+18% vs. +3.9%). Activation analysis shows rarer, larger English spikes (>35) that are more prone to saturation under unscaled E4M3, whereas PT-BR activations are more concentrated. Our FP8 results reflect a naive casting stress test (no calibration/scaling), not an optimized FP8 recipe.
2024
Evaluating Federated Learning with Homomorphic Encryption for Medical Named Entity Recognition Using Compact BERT Models
Marcos Felipe Rezende | Rodrigo Silva | Eduardo Luz | Pedro Silva
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology
Marcos Felipe Rezende | Rodrigo Silva | Eduardo Luz | Pedro Silva
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology
Toxic Text Classification in Portuguese: Is LLaMA 3.1 8B All You Need?
Amanda Oliveira | Pedro Silva | Vander Freitas | Valéria Santos | Gladston Moreira | Eduardo Luz
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology
Amanda Oliveira | Pedro Silva | Vander Freitas | Valéria Santos | Gladston Moreira | Eduardo Luz
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology