Matheus Camasmie Pavan


2026

Stance detection is the task of determining whether an input text expresses a stance in favour of or against a given target topic. This, in a standard supervised fashion, will typically require a new set of labelled training examples for each test topic. As an alternative to full supervision (or costly LLM-based methods), this study leverages political alignment information by assuming that stances on related moral or political issues tend to co-occur (e.g., support for a right-wing politician correlating with support for the death penalty or opposition to abortion). This alignment, presently treated as a form of distance labelling, enables stance inference without constructing new corpora and is evaluated against standard cross-domain and prompt-based methods using a large corpus of stances in the Portuguese language.

2023

Transformer-based language models such as Bidirectional Encoder Representations from Transformers (BERT) are now mainstream in the NLP field, but extensions to languages other than English, to new domains and/or to more specific text genres are still in demand. In this paper we introduced BERTabaporu, a BERT language model that has been pre-trained on Twitter data in the Brazilian Portuguese language. The model is shown to outperform the best-known general-purpose model for this language in three Twitter-related NLP tasks, making a potentially useful resource for Portuguese NLP in general.
Stance prediction - the computational task of inferring attitudes towards a given target topic of interest - relies heavily on text data provided by social media or similar sources, but it may also benefit from non-text information such as demographics (e.g., users’ gender, age, etc.), network structure (e.g., friends, followers, etc.), interactions (e.g., mentions, replies, etc.) and other non-text properties (e.g., time information, etc.). However, so-called hybrid (or in some cases multimodal) approaches to stance prediction have only been developed for a small set of target languages, and often making use of count-based text models (e.g., bag-of-words) and time-honoured classification methods (e.g., support vector machines). As a means to further research in the field, in this work we introduce a number of text- and non-text models for stance prediction in the Portuguese language, which make use of more recent methods based on BERT and an ensemble architecture, and ask whether a BERT stance classifier may be enhanced with different kinds of network-related information.