Afonso Simplício
2026
AMALIA: A Fully Open Large Language Model for European Portuguese
Afonso Simplício | Gonçalo Vinagre | Miguel Moura Ramos | Diogo Tavares | Rafael Ferreira | Giuseppe Attanasio | Duarte M. Alves | Inês Calvo | Inês Vieira | Rui Guerra | James Furtado | Beatriz Canaverde | Iago Paulo | Vasco Ramos | Diogo Glória-Silva | Miguel Faria | Marcos Treviso | Daniel Gomes | Pedro Gomes | David Semedo | André Martins | João Magalhães
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Afonso Simplício | Gonçalo Vinagre | Miguel Moura Ramos | Diogo Tavares | Rafael Ferreira | Giuseppe Attanasio | Duarte M. Alves | Inês Calvo | Inês Vieira | Rui Guerra | James Furtado | Beatriz Canaverde | Iago Paulo | Vasco Ramos | Diogo Glória-Silva | Miguel Faria | Marcos Treviso | Daniel Gomes | Pedro Gomes | David Semedo | André Martins | João Magalhães
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Despite rapid progress in open large language models (LLMs), European Portuguese (pt-PT) remains underrepresented in both training data and native evaluation, with machine-translated benchmarks likely missing the variant’s linguistic and cultural nuances. We introduce AMALIA, a fully open LLM that prioritizes pt-PT by using more high-quality pt-PT data during both the mid- and post-training stages. To evaluate pt-PT more faithfully, we release a suite of pt-PT benchmarks that includes translated standard tasks and four new datasets targeting pt-PT generation, linguistic competence, and pt-PT/pt-BR bias. Experiments show that AMALIA matches strong baselines on translated benchmarks while substantially improving performance on pt-PT-specific evaluations, supporting the case for targeted training and native benchmarking for European Portuguese.
2024
V-GlórIA - Customizing Large Vision and Language Models to European Portuguese
Afonso Simplício | David Semedo | Joao Magalhaes
Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
Afonso Simplício | David Semedo | Joao Magalhaes
Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
Generative Vision and Language models have obtained remarkable results recently, thanks to the use of robust pre-trained Visual encoders and Large Language Models (LLMs), together with efficient model adaptation training strategies, requiring minimal architecturalmodifications, while preserving LLMs’ original capabilities. With these advances focusing mainly on the English language, there is a gap in customization methodologies for other languages. In this paper, we propose a customization methodology that adapts existingstate-of-the-art vision and language architectures to European Portuguese (PT-PT). As a result of applying this methodology, we introduce V-GlórIA , the first Large Vision and Language generative model specifically customized for European Portuguese. V-GlórIA supports multimodal tasks such as image captioning, retrieval, and dialogue. To deliver V-GlórIA, we leverage state-of-the-art V&L architectures, and contribute with PT-PT machine-translated pre-training (CC3M PT-PT) and benchmark (MSCOCO PT-PT and VisDial PT-PT) datasets.Our experiments show that V-GlórIA delivers promising performance in text-image retrieval and downstream tasks in a zero-shot setting, such as image captioning and visual dialogue tasks, highlighting the effectiveness of our customization approach.