Larissa A. de Freitas


2026

Structured Sentiment Analysis (SSA) aims to extract fine-grained opinion structures as tuples (holder, target, expression, polarity). While recent advances have improved SSA for English, Brazilian Portuguese lacks dedicated resources. This paper presents an exploratory study introducing a manually annotated dataset of hotel reviews for SSA in Brazilian Portuguese. We propose a baseline approach fine-tuning the BERTimbau model under a BIO tagging scheme to extract sentiment spans. Unlike traditional approaches that model relations explicitly, we assess the viability of span-level extraction as a first step for SSA in this language. Experimental results using a strict train/validation/test split show that our approach achieves a span-level F1-score of 48.41 for holder extraction and a macro F1-score of 61.52. We also discuss the linguistic challenges of holder extraction in Portuguese, specifically regarding implicit subjects (pro-drop), and provide a detailed error analysis. These results establish a preliminary baseline for future relation-aware models in Portuguese.
Recent advances in the field have revolutionized Question and Answering (QA). However, for languages like Portuguese, progress is often hindered by the lack of native training resources. To address this gap, this paper introduces LARI, a new dataset designed to benchmark and enhance QA in Portuguese. Our methodology combines the capabilities of the Sabiá-7B model, fine-tuned via QLoRA on a domain-specific corpus, with human validation. We utilized the book Natural Language Processing – Concepts, Techniques, and Applications in Portuguese (2nd Edition), as a case study for content extraction. The generated instances underwent expert human evaluation, achieving an average quality score of 4.47 out of 5. The final dataset, comprising 464 context-question-answer triples, is made publicly available to the community, offering a valuable resource for future research in low-resource settings.