Gabriel Assis

2026

Sintomas Linguísticos: Geração Aumentada por Recuperação e Raciocínio em LLMs sob a Variação Português-Inglês em Contextos Médicos
Guilherme Vianna de Moura | Gabriel Assis | Aline Paes
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

Modelos de Língua de Grande Porte (LLMs) têm demonstrado desempenho expressivo em tarefas de raciocínio médico. No entanto, sua robustez diante de variações linguísticas ainda é pouco explorada, especialmente em idiomas além do inglês, como o português. Neste trabalho, investigamos como o idioma de entrada afeta o desempenho e o comportamento de raciocínio de LLMs médicos, bem como se a Geração Aumentada por Recuperação (RAG) é capaz de mitigar eventuais limitações decorrentes dessas variações. Para isso, realizamos experimentos em português e em inglês, utilizando duas variantes do modelo MedGemma, com 4B e 27B parâmetros, e avaliando-as em três conjuntos de dados médicos. A avaliação combina métricas quantitativas de acurácia com análises qualitativas e estruturais das cadeias de raciocínio e das respostas geradas pelos modelos. Os resultados indicam que a variação linguística impacta de forma mais acentuada os modelos de menor porte. Em particular, a variante de 4B parâmetros apresenta desempenho consistentemente inferior quando as entradas são fornecidas em português. Em contraste, a variante de 27B parâmetros demonstra maior robustez entre idiomas, mantendo níveis semelhantes de acurácia e de estrutura de raciocínio tanto em português quanto em inglês. Embora o sistema de RAG implementado apresente recuperação de documentos de boa qualidade, sua integração não resulta em ganhos consistentes para o modelo menor, o que sugere limitações na exploração efetiva do contexto adicional. De forma geral, este trabalho contribui para o entendimento dos limites atuais dos LLMs médicos em contextos multilíngues, destacando os desafios associados ao desempenho em idiomas com recursos limitados.

pdf bib abs

Auditing the Evaluators: How Far Can Automatic Evaluation Go in Assessing Portuguese Financial Texts?
Marina Ramalhete Masid | Gabriel Assis | Daniela Vianna | Aline Paes | Altigran Soares da Silva
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

Automatic metrics are widely used to evaluate text quality across various natural language processing tasks. Despite their convenience and scalability, the extent to which these metrics reliably reflect textual quality remains an open challenge. The LLM-as-a-judge paradigm has recently emerged, aligning more closely with human judgments by using LLMs themselves as evaluators. However, there is still a gap in such evaluations across specific domains and languages, as most prior work focuses on generic task benchmarks in English. In this paper, we examine the robustness of both traditional automatic metrics and the LLM-as-a-judge approach for assessing the quality of financial commentaries in Portuguese, an underexplored task and language that has been neglected in previous work. We introduce fine-grained perturbations into the texts generated by specialists to analyze which types of noise most significantly affect evaluation outcomes, using noise-free counterparts as references. The results highlight the weaknesses of classical metrics in this specific task and the limitations of even recent evaluation paradigms, underscoring the need to develop context- and domain-sensitive.

2025

pdf bib abs

Irapuarani at SemEval-2025 Task 10: Evaluating Strategies Combining Small and Large Language Models for Multilingual Narrative Detection
Gabriel Assis | Lívia de Azevedo | João Vitor de Moraes | Laura Alvarenga | Aline Paes
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents the Irapuarani team’s participation in SemEval-2025 Task 10, Subtask 2, which focuses on hierarchical multi-label classification of narratives from online news articles. We explored three distinct strategies: (1) a direct classification approach using a multilingual Small Language Model (SLM), disregarding the hierarchical structure; (2) a translation-based strategy where texts from multiple languages were translated into a single language using a Large Language Model (LLM), followed by classification with a monolingual SLM; and (3) a hybrid strategy leveraging an SLM to filter domains and an LLM to assign labels while accounting for the hierarchy. We conducted experiments on datasets in all available languages, namely Bulgarian, English, Hindi, Portuguese and Russian. Our results show that Strategy 2 is the most generalizable across languages, achieving test set rankings of 21st in English, 9th in Portuguese and Russian, 7th in Bulgarian, and 10th in Hindi.

2024

pdf bib

Exploring Portuguese Hate Speech Detection in Low-Resource Settings: Lightly Tuning Encoder Models or In-Context Learning of Large Models?
Gabriel Assis | Annie Amorim | Jonnathan Carvalho | Daniel de Oliveira | Daniela Vianna | Aline Paes
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1

pdf bib abs

Analysis of Material Facts on Financial Assets: A Generative AI Approach
Gabriel Assis | Daniela Vianna | Gisele L. Pappa | Alexandre Plastino | Wagner Meira Jr | Altigran Soares da Silva | Aline Paes
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing

Material facts (MF) are crucial and obligatory disclosures that can significantly influence asset values. Following their release, financial analysts embark on the meticulous and highly specialized task of crafting analyses to shed light on their impact on company assets, a challenge elevated by the daily amount of MFs released. Generative AI, with its demonstrated power of crafting coherent text, emerges as a promising solution to this task. However, while these analyses must incorporate the MF, they must also transcend it, enhancing it with vital background information, valuable and grounded recommendations, prospects, potential risks, and their underlying reasoning. In this paper, we approach this task as an instance of controllable text generation, aiming to ensure adherence to the MF and other pivotal attributes as control elements. We first explore language models’ capacity to manage this task by embedding those elements into prompts and engaging popular chatbots. A bilingual proof of concept underscores both the potential and the challenges of applying generative AI techniques to this task.

pdf bib

Modestos e Sustentáveis: O Ajuste Eficiente Beneficia Modelos de Língua de Menor Escala em Português?
Gabriel Assis | Arthur Vasconcelos | Lívia Silva | Mariza Ferro | Aline Paes
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology

Gabriel Assis

2026

2025

2024

Co-authors

Venues