Vládia Pinheiro

Also published as: Vladia Pinheiro


2026

Robust sentiment analysis in Portuguese is central to applications across Lusophone contexts, yet systematic evaluations still focus predominantly on English and proprietary systems. This paper presents a comparative study of 29 open-source Large Language Models (LLMs) and two proprietary models on Portuguese sentiment classification under four prompting strategies: Zero-Shot, Few-Shot, Chain-of-Thought (CoT), and CoT with Few-Shot (CoT+FS). Experiments on a unified three-class benchmark built from three public review corpora (about 3,000 instances) comprise roughly 372,000 inferences, totaling approximately 150M input tokens and 65M output tokens. Results show that CoT+FS generally yields the best performance for larger models, while several compact open-source models obtain competitive F1-scores with substantially lower computational cost, making them suitable for real-world deployments. We identify concrete teacher–student configurations tailored for knowledge distillation in Portuguese sentiment analysis.
We present Causal_QA.PT, a human–LLM co-curated benchmark for causal question answering in Portuguese, addressing the lack of high-quality evaluation resources for causal reasoning in non-English languages. The dataset is developed through a hybrid human–LLM process with targeted generation, validation, and evaluation procedures, and is organized according to the PEARL causal typology. Using this resource, we evaluate the ability of Large Language Models to answer causal questions in Portuguese and examine the role of explicitly providing causal class information in prompt design. Our findings show that current LLMs are capable of producing high-quality causal responses in Portuguese, with GPT-5 Mini in particular demonstrating strong performance in judgment-based evaluation. Explicit causal class information yields model- and question-dependent benefits, particularly for interventional and counterfactual questions. Finally, we observe that human reference answers are not always superior, underscoring the importance of careful benchmark curation and robust evaluation for underrepresented languages.

2025

Large Language Models (LLMs) are increasingly central to the development of generative AI across diverse fields. While some anticipate these models may mark a step toward artificial general intelligence, their ability to handle complex causal reasoning remains unproven. Causal reasoning, particularly at Pearl’s interventional and counterfactual levels, is essential for true general intelligence. In this work, we introduce CaLQuest.PT, a dataset of over 8,000 natural causal questions in Portuguese, collected from real human interactions. Built upon a novel three-axis taxonomy, CaLQuest.PT categorizes questions by causal intent, action requirements, and the level of causal reasoning needed (associational, interventional, or counterfactual). Our findings from evaluating CaLQuest.PT’s seed questions with GPT-4o reveal that this LLM face challenges in handling interventional and relation-seeking causal queries. These results suggest limitations in using GPT-4o for extending causal question annotations and highlight the need for improved LLM strategies in causal reasoning. CaLQuest.PT provides a foundation for advancing LLM capabilities in causal understanding, particularly for the Portuguese-speaking world.

2024

2023

2017

2015

2011