Matheus Peixoto


2026

The choice between large-scale, multilingual, foundation models and specialized monolingual models for languages like Brazilian Portuguese (PT-BR) presents a complex trade-off between generalization and specialization. This paper investigates this trade-off through an empirical study across a diverse suite of tasks. We evaluate multiple families of language models under both linear probing and fine-tuning regimes. We find that monolingual encoders exhibit greater "adaptation plasticity" during fine-tuning, improving on both classification and semantic similarity, where global (multilingual) models degrade. However, this plasticity comes at a cost: our tokenization analysis suggests that monolingual models struggle with foreign terms, whereas modern multilingual tokenizers show surprising morphological competence, challenging a long-standing assumption in the field. We conclude that the optimal model choice is a task-dependent trade-off between vocabulary coverage and adaptation flexibility.
Quantization is key for efficient LLM inference, but its language-specific effects are understudied. We compare INT8 and FP8 (E4M3) quantization for Meta-Llama-3-8B on English and Brazilian Portuguese (PT-BR). INT8 with outlier handling preserves perplexity in both languages, while naive FP8 casting degrades English far more than PT-BR (+18% vs. +3.9%). Activation analysis shows rarer, larger English spikes (>35) that are more prone to saturation under unscaled E4M3, whereas PT-BR activations are more concentrated. Our FP8 results reflect a naive casting stress test (no calibration/scaling), not an optimized FP8 recipe.
Hate speech detection is often treated as a binary task, ignoring the hierarchical nature of toxicity, such as severity levels and specific target groups. This work presents a Multitask Learning (MTL) approach for the HateBR dataset, utilizing a shared BERTimbau encoder to simultaneously predict binary offensiveness, ordinal severity, and hate speech targets. Our experiments demonstrate that the MTL architecture outperforms Single-Task baselines on the primary offensive detection task, increasing the Matthews Correlation Coefficient from 0.80 to 0.82. Beyond predictive performance, we show that joint training implicitly enforces hierarchical sanity: the unified model yields a 0% target-inconsistency rate (i.e., no cases where a comment is predicted Non-offensive while still assigned a hate target). However, we observe negative transfer in the fine-grained multilabel target task (Micro-F1 drops from 0.59 to 0.42), highlighting a trade-off between logical consistency and target attribution under extreme imbalance.