Evandro Eduardo Seron Ruiz


2026

Reliable inflation forecasts play a critical role in economic stability and policy decisions. Traditional econometric models perform well but often overlook qualitative signals that could improve predictive accuracy. Recent advances in AI-based Natural Language Processing enable the extraction of latent sentiment, offering a promising avenue for inflation forecasting. This study proposes a framework that combines Large Language Models (LLMs) to extract sentiment variables from the Brazilian Monetary Policy Committee (COPOM) minutes, optimize bias to match human-collected sentiment, and integrate them into ARIMA and LSTM models for one-step-ahead monthly IPCA prediction. Results show that LLM-generated sentiment trends are temporally coherent with historical inflation patterns and highly statistically significant (p < 0.001). Models whose sentiment evaluations aligned more closely with human assessments (e.g., grok-4-fast and llama-4-maverick) achieved superior forecasting performance. ARIMA models consistently benefited from sentiment inclusion, while LSTM results were more variable.
Recorded interviews can capture their subjects’ memories, perceptions, and emotions. When conducted with notable figures, they also have the potential to serve as a resource for interdisciplinary research, impacting various branches of science. In this work, we mark the beginning of a significant project analyzing interviews from the Roda Viva program, the longest-running interview show on Brazilian television. In this initial study, we examined six memorable interviews with six Brazilian Formula One drivers to compare the performance of two named entity recognition methods: a statistical-neural method and large language models, both evaluated against manual annotations. Still, it highlighted relevant qualitative distinctions: the statistical method showed a rigid dependence on capitalisation and lexical familiarity, leading to mechanical false positives and missing non-capitalised entities, while the LLM exhibited greater linguistic sensitivity, retrieving contextual entities and being robust to transcription errors, though it still produces false positives. The LLM-based model appears more promising due to its flexibility and the potential for refinement via instructions to filter for ambiguities, favouring the automation of social network extraction in the corpus.

2023

2022

2021

Split-and-rephrase is a challenging task that promotes the transformation of a given complex input sentence into multiple shorter sentences retaining equivalent meaning. This rewriting approach conceptualizes that shorter sentences benefit human readers and improve NLP downstream tasks attending as a preprocessing step. This work presents a complete pipeline capable of performing the split-and-rephrase method in a cross-lingual manner. We trained sequence-to-sequence neural models as from English corpora and applied them to predict the transformations in English and Brazilian Portuguese sentences jointly with BERT’s masked language modeling. Contrary to traditional approaches that seek training models with extensive vocabularies, we present a non-trivial way to construct symbolic ones generalized solely by grammatical classes (POS tags) and their respective recurrences, reducing the amount of necessary training data. This pipeline contribution showed competitive results encouraging the expansion of the method to languages other than English.