Miguel de Mello Carpi

Also published as: Miguel de Mello Carpi

2026

FlexQwen: Exploring Hybrid Objectives and Text Originality for Portuguese
Miguel de Mello Carpi | Marcelo Finger
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

While scaling laws suggest increasing model and dataset sizes for better results, efficient pre-training techniques for low-resource scenarios present unique challenges that require further investigation. This work introduces FlexQwen, a model based on the Qwen 3 architecture adapted for a hybrid causal-masked objective, and the Carolina Originality dataset, a subset of the Corpus Carolina tailored for efficient pre-training in Portuguese. We investigate two primary research questions: the influence of hybrid masked-causal modelling and the impact of text originality on model performance. Our experiments compare a high-originality Gold split against a length-matched control group. Results indicate that hybrid objectives may be viable for efficient training. Furthermore, we provide open access to our code, datasets, and training logs to foster further research in efficient Portuguese LLMs.

Miguel de Mello Carpi

2026

2024

Co-authors

Venues