Helena de Medeiros Caseli - ACL Anthology

Helena de Medeiros Caseli

Also published as: Helena Caseli, Helena M. Caseli

2026

The visible and the latent linguistic clues of mental health in Brazilian Portuguese textual posts
Rodrigo Wilkens | Helena Caseli | Vania Neris | Aline Villavicencio
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2

Depressive symptomatology may be reflected in the language used by possible depressive profiles (PDP). This paper investigates to what extent symptoms of depression are manifested in Brazilian Portuguese narrative texts, and whether these can be used to identify relevant linguistic clues related to PDP. Moreover, the relation between these symptoms and PDP is explored, characterising the lexical, syntactic, and psycholinguistic aspects of texts produced by PDP. We found that texts associated with PDPs differed in some of these characteristics from non-PDP texts. The interactions between symptoms and PDP can also shed light on patterns of communication differentiation and the relationship between them. The results of this paper can help to characterise and understand the indicators that can be used to train more bespoke and accurate large language models.

CoDEl-BR: An Electoral Debate Corpus from Brazilian Municipal Elections
Alessandra Gomes | Aline Paes | Helena Caseli
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

Electoral debates are influential moments in public discourse, providing candidates with a high-visibility platform to present their proposals, contrast their positions, and engage in exchanges that shape voter decisions. In Brazil, these debates reach a broad and diverse audience, reflecting regional, social, and ideological variations that affect linguistic choices and thematic content. This paper presents CoDEl-BR (Corpus de Debates Eleitorais, in Portuguese), a corpus of transcripts from 22 second-round mayoral debates held in 13 Brazilian state capitals during the 2024 municipal elections. It comprises 2,943 transcript segments totaling approximately 32 hours. Exploratory analyses reveal differences in thematic priorities between candidates and voters’ questions, as well as variations by race and party affiliation. The corpus aims to enable research in discourse and argumentation analysis, stance and sentiment detection, polarization modeling, and other related NLP tasks. We demonstrate that this initial release provides a curated, high-quality subset of debates with significant potential for expansion.

2024

Biases in GPT-3.5 Turbo model: a case study regarding gender and language
Fernanda Malheiros Assi | Helena Caseli
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology

Aspect-based sentiment analysis in comments on political debates in Portuguese: evaluating the potential of ChatGPT
Eloize Seno | Lucas Silva | Fábio Anno | Fabiano Rocha | Helena Caseli
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1

Using Large Language Models for Identifying Satirical News in Brazilian Portuguese
Gabriela Wick-Pedro | Cássio Faria da Silva | Marcio Lima Inácio | Oto Araújo Vale | Helena de Medeiros Caseli
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1

Identifying Fine-grained Depression Signs in Social Media Posts
Augusto R. Mendes | Helena Caseli
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Natural Language Processing has already proven to be an effective tool for helping in the identification of mental health disorders in text. However, most studies limit themselves to a binary classification setup or base their label set on pre-established resources. By doing so, they don’t explicitly model many common ways users can express their depression online, limiting our understanding of what kind of depression signs such models can accurately classify. This study evaluates how machine learning techniques deal with the classification of a fine-grained set of 21 depression signs in social media posts from Brazilian undergraduate students. We found out that model performance is not necessarily driven by a depression sign’s frequency on social media posts, since evaluated machine learning techniques struggle to classify the majority of signs of depression typically present in posts. Thus, model performance seems to be more related to the inherent difficulty of identifying a given sign than with its occurrence frequency.

Identificação de aspectos explícitos e implícitos em críticas gastronômicas em português: avaliando o potencial dos LLMs
Luiz H. N. Silva | Eloize R. M. Seno | Rozane R. Rebechi | Helena M. Caseli | Fabiano M. Rocha-Jr. | Guilherme A. Faller
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology

2023

Classificação de Polaridade Orientada aos Alvos de Opinião em Comentarios sobre Debate Politico em Português
Eloize R. Marques Seno | Fábio S. Igarashi Anno | Lucas Lazarini | Helena de Medeiros Caseli
Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology

Pipeline para identificação de erros lexicais e geração de sugestões de correão
Luana Garcia | Miguel Chinellato | Helena de Medeiros Caseli | Leandro Oliveira
Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology

Abordagens Baseadas em Lexicos para a Classificação de Sentimentos Orientada aos Alvos de Opinião em Comentarios do Dominio Politico
Lucas Lazarini | Fabio Anno | Eloize Seno | Helena de Medeiros Caseli
Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology

Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology
Jackson Wilke da Cruz Souza | Helena de Medeiros Caseli | Finatto Maria José Bocorny
Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology

Choosing What to Mask: More Informed Masking for Multimodal Machine Translation
Julia Sato | Helena Caseli | Lucia Specia
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Pre-trained language models have achieved remarkable results on several NLP tasks. Most of them adopt masked language modeling to learn representations by randomly masking tokens and predicting them based on their context. However, this random selection of tokens to be masked is inefficient to learn some language patterns as it may not consider linguistic information that can be helpful for many NLP tasks, such as multimodal machine translation (MMT). Hence, we propose three novel masking strategies for cross-lingual visual pre-training - more informed visual masking, more informed textual masking, and more informed visual and textual masking - each one focusing on learning different linguistic patterns. We apply them to Vision Translation Language Modelling for video subtitles (Sato et al., 2022) and conduct extensive experiments on the Portuguese-English MMT task. The results show that our masking approaches yield significant improvements over the original random masking strategy for downstream MMT performance. Our models outperform the MMT baseline and we achieve state-of-the-art accuracy (52.70 in terms of BLEU score) on the How2 dataset, indicating that more informed masking helps in acquiring an understanding of specific language structures and has great potential for language understanding.

2022

Multilingual and Multimodal Learning for Brazilian Portuguese
Júlia Sato | Helena Caseli | Lucia Specia
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Humans constantly deal with multimodal information, that is, data from different modalities, such as texts and images. In order for machines to process information similarly to humans, they must be able to process multimodal data and understand the joint relationship between these modalities. This paper describes the work performed on the VTLM (Visual Translation Language Modelling) framework from (Caglayan et al., 2021) to test its generalization ability for other language pairs and corpora. We use the multimodal and multilingual corpus How2 (Sanabria et al., 2018) in three parallel streams with aligned English-Portuguese-Visual information to investigate the effectiveness of the model for this new language pair and in more complex scenarios, where the sentence associated with each image is not a simple description of it. Our experiments on the Portuguese-English multimodal translation task using the How2 dataset demonstrate the efficacy of cross-lingual visual pretraining. We achieved a BLEU score of 51.8 and a METEOR score of 78.0 on the test set, outperforming the MMT baseline by about 14 BLEU and 14 METEOR. The good BLEU and METEOR values obtained for this new language pair, regarding the original English-German VTLM, establish the suitability of the model to other languages.

2021

Analise de polaridade e de topicos em tweets no dominio da politica no Brasil
Leonardo Capellaro | Helena Caseli
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology

Measuring Brazilian Portuguese Product Titles Similarity using Embeddings
Alan Romualdo | Livy Real | Helena Caseli
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology

Classificação multimodal para detecão de produtos proibidos em uma plataforma marketplace
Alan Romualdo | Livy Real | Helena Caseli
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology

Relation extraction in structured and unstructured data: a comparative investigation on smartphone titles in the e-commerce domain
João Barbirato | Livy Real | Helena Caseli
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology

Identificando sintomas de depressão em postagens do Twitter em português do Brasil
Augusto Mendes | Rafael Passador | Helena Caseli
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology

2020

NMT and PBSMT Error Analyses in English to Brazilian Portuguese Automatic Translations
Helena Caseli | Marcio Lima Inácio
Proceedings of the Twelfth Language Resources and Evaluation Conference

Machine Translation (MT) is one of the most important natural language processing applications. Independently of the applied MT approach, a MT system automatically generates an equivalent version (in some target language) of an input sentence (in some source language). Recently, a new MT approach has been proposed: neural machine translation (NMT). NMT systems have already outperformed traditional phrase-based statistical machine translation (PBSMT) systems for some pairs of languages. However, any MT approach outputs errors. In this work we present a comparative study of MT errors generated by a NMT system and a PBSMT system trained on the same English – Brazilian Portuguese parallel corpus. This is the first study of this kind involving NMT for Brazilian Portuguese. Furthermore, the analyses and conclusions presented here point out the specific problems of NMT outputs in relation to PBSMT ones and also give lots of insights into how to implement automatic post-editing for a NMT system. Finally, the corpora annotated with MT errors generated by both PBSMT and NMT systems are also available.

2018

The Effects of Unimodal Representation Choices on Multimodal Learning
Fernando Tadao Ito | Helena de Medeiros Caseli | Jander Moreira
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

Discovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment
Natalie Vargas | Carlos Ramisch | Helena Caseli
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

We propose a method for joint unsupervised discovery of multiword expressions (MWEs) and their translations from parallel corpora. First, we apply independent monolingual MWE extraction in source and target languages simultaneously. Then, we calculate translation probability, association score and distributional similarity of co-occurring pairs. Finally, we rank all translations of a given MWE using a linear combination of these features. Preliminary experiments on light verb constructions show promising results.

2015

Never-Ending Multiword Expressions Learning
Alexandre Rondon | Helena Caseli | Carlos Ramisch
Proceedings of the 11th Workshop on Multiword Expressions

2014

Automatic semantic relation extraction from Portuguese texts
Leonardo Sameshima Taba | Helena Caseli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Nowadays we are facing a growing demand for semantic knowledge in computational applications, particularly in Natural Language Processing (NLP). However, there aren’t sufficient human resources to produce that knowledge at the same rate of its demand. Considering the Portuguese language, which has few resources in the semantic area, the situation is even more alarming. Aiming to solve that problem, this work investigates how some semantic relations can be automatically extracted from Portuguese texts. The two main approaches investigated here are based on (i) textual patterns and (ii) machine learning algorithms. Thus, this work investigates how and to which extent these two approaches can be applied to the automatic extraction of seven binary semantic relations (is-a, part-of, location-of, effect-of, property-of, made-of and used-for) in Portuguese texts. The results indicate that machine learning, in particular Support Vector Machines, is a promising technique for the task, although textual patterns presented better results for the used-for relation.

2011

PorTAl: Recursos e Ferramentas de Tradução Automática para o Português do Brasil (PorTAl: Resources and Tools for Machine Translation of Brazilian Portuguese) [in Portuguese]
Thiago Lima Vieira | Helena de Medeiros Caseli
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology

Combining Models for the Alignment of Parallel Syntactic Trees
Josue G. Araújo | Helena M. Caseli
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology

Improving Lexical Alignment Using Hybrid Discriminative and Post-Processing Techniques
Paulo Schreiner | Aline Villavicencio | Leonardo Zilio | Helena M. Caseli
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology

2010

Using Common Sense to generate culturally contextualized Machine Translation
Helena de Medeiros Caseli | Bruno Akio Sugiyama | Junia Coutinho Anacleto
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

Computational Linguistics in Brazil: An Overview
Thiago Pardo | Caroline Gasperin | Helena de Medeiros Caseli | Maria das Graças Nunes
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

2009

Statistically-Driven Alignment-Based Multiword Expression Identification for Technical Domains
Helena Caseli | Aline Villavicencio | André Machado | Maria José Finatto
Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications (MWE 2009)

2005

LIHLA: Shared Task System Description
Helena M. Caseli | Maria G. V. Nunes | Mikel L. Forcada
Proceedings of the ACL Workshop on Building and Using Parallel Texts

Co-authors

Carlos Ramisch 2

Alan Romualdo 2

Junia Coutinho Anacleto 1

Fábio S. Igarashi Anno 1

Josue G. Araújo 1

Fernanda Malheiros Assi 1

João Barbirato 1

Finatto Maria José Bocorny 1

Leonardo Capellaro 1

Miguel Chinellato 1

Guilherme A. Faller 1

Maria José B. Finatto 1

Mikel L. Forcada 1

Caroline Gasperin 1

Alessandra Gomes 1

Fernando Tadao Ito 1

André Machado 1

Augusto Mendes 1

Augusto R. Mendes 1

Jander Moreira 1

Maria G. V. Nunes 1

Leandro Oliveira 1

Rafael Passador 1

Rozane R. Rebechi 1

Fabiano Rocha 1

Fabiano M. Rocha-Jr. 1

Alexandre Rondon 1

Paulo Schreiner 1

Eloize R. Marques Seno 1

Eloize R. M. Seno 1

Luiz H. N. Silva 1

Jackson Wilke da Cruz Souza 1

Bruno Akio Sugiyama 1

Leonardo Sameshima Taba 1

Natalie Vargas 1

Thiago Lima Vieira 1

Maria das Graças Volpe Nunes 1

Gabriela Wick-Pedro 1

Rodrigo Wilkens 1

Leonardo Zilio 1

Cássio Faria da Silva 1

Venues