Norton Trevisan Roman

Also published as: Norton T. Roman, Norton Trevisan Roman


2025

This paper details the development of a corpus with posts in Brazilian Portuguese published by Brazilian political elites on X (formerly Twitter) regarding COVID-19 vaccines. The corpus consists of 9,045 posts annotated for relevance, stance and sentiment towards COVID-19 vaccines and vaccination during the first three years of the COVID-19 pandemic (2020-2022).Nine annotators, working in three groups, classified relevance, stance, and sentiment in messages posted between 2020 and 2022 by local political elites. The annotators underwent extensive training, and weekly meetings were conducted to ensure intra-group annotation consistency. The analysis revealed fair to moderate inter-annotator agreement (Average Krippendorf’s alpha of 0.94 for relevance, 0,67 for sentiment and 0,70 for stance). This work makes four significant contributions to the literature. First, it addresses the scarcity of corpora in Brazilian Portuguese, particularly on COVID-19 or vaccines in general. Second, it provides a reliable annotation scheme for sentiment and stance classification, distinguishing both tasks, thereby improving classification precision. Third, it offers a corpus annotated with stance and sentiment according to this scheme, demonstrating how these tasks differ and how conflating them may lead to inconsistencies in corpus construction, as a results of confounding these phenomena — a recurring issue in NLP research beyond studies focusing on vaccines. And fourth, this annotated corpus may serve as the gold standard for fine-tuning and evaluating supervised machine learning models for relevance, sentiment and stance analysis of X posts on similar domains.
Behavioural finance emphasizes the relevance of investor sentiment and emotions in the pricing of financial assets. However, little research has examined how discrete emotions can be detected in text related to this domain, with extant work focusing mostly in sentiment instead. This study approaches this problem by describing a framework for emotion classification in tweets related to the stock market, written in Brazilian Portuguese. Emotion classifiers were then developed, based on Plutchik’s psychoevolutionary theory, by fine-tuning BERTimbau, a pre-trained BERT-based language model for Brazilian Portuguese, and applying it to an existing corpus of tweets, from the stock market domain, previously annotated with emotions. Each of Plutchik’s four emotional axes was modelled as a ternary classification problem. For each axis, 30 independent training iterations were executed using a repeated holdout strategy with different train/test splits in each iteration. In every iteration, hyperparameter tuning was performed via 10-fold stratified cross-validation on the training set to identify the best configuration. A final model was then retrained using the selected hyperparameters and evaluated on a hold-out test set, generating a distribution of macro-F1 scores in out-of-sample data. The results demonstrated statistically significant improvements over a stratified random baseline (Welch’s t-test, << 0.001 across all axes), with macro-F1 scores ranging from 0.50 to 0.61. These findings point to the feasibility of using transformer-based models to capture emotional nuance in financial texts written in Portuguese and provide a reproducible framework for future research.

2024

2023

2022

2015

2013