Daniela Barreiro Claro

2026

dialect2vec: Um método baseado em vetores para transcrição dialetal do português a partir de questionários do ALiB
Laila Mota | Daniela Barreiro Claro | Eloize R. Marques Seno | Rerisson Cavalcante de Araújo
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

A modelagem da variação dialetal enfrenta desafios quando dependente de modelos de linguagem baseados em sub-palavras, que frequentemente falham ao processar a complexidade de transcrições fonéticas devido a restrições de vocabulário e vieses semânticos. Este trabalho introduz o dialect2vec, um método para capturar a diversidade dialetal do Português Brasileiro. Nossa proposta adota o modelo token-free ByT5 para codificar sequências do Alfabeto Fonético Internacional (IPA) ao nível de byte, mitigando a perda de informação causada por tokens desconhecidos. Os experimentos foram realizados com dados do Atlas Linguístico do Brasil (ALiB), em que a dimensão fonética isolada demonstrou viabilidade em tarefas de agrupamento não supervisionado, com desempenho próximo do estado da arte léxico (BERTimbau), comprovando que arquiteturas baseadas em bytes podem recuperar estruturas dialetais complexas exclusivamente através de pistas fonológicas, oferecendo um mapeamento mais granular das fronteiras linguísticas.

pdf bib abs

AttentionApp: An Interactive Tool for Analyzing Transformer Attention Patterns in Portuguese
Ricardo G. Oliveira | Daniela Barreiro Claro
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2

This paper presents AttentionApp, an interactive demonstration system designed to support the inspection and linguistic analysis of attention mechanisms in Transformer-based language models for Portuguese. The tool allows users to input sentences in Portuguese and visualize attention distributions across layers and heads, enabling fine-grained qualitative analysis of syntactic and semantic patterns captured by the model. AttentionApp is intended as a research-oriented tool, facilitating exploratory analysis, hypothesis generation, and interpretability studies for Portuguese Natural Language Processing.

pdf bib abs

Analysis of Machine Translators on Sentences Generated by Portuguese Image Captioning Models
Natan Moura | João Medrado Gondim | Daniela Barreiro Claro | Babacar Mane
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

Recent works in the fields of computer vision and natural language processing have enabled the recognition and identification of objects in images, generating automatic descriptions. Despite these advancements, the main research in this field is primarily related to the English language, requiring some adaptation when dealing with other languages, such as Portuguese. One of these methods is the translate-train approach, which involves translating the training dataset into the desired language. However, there are various translators with different levels of effectiveness available. The primary objective of this work is to evaluate the behavior of image captioning models when trained on datasets translated into Portuguese by different automatic translators, both quantitatively (cost, training time, metrics on the test set) and qualitatively (comparative evaluation form, error analysis). The results indicate that it is possible to obtain valid automatic descriptions in Portuguese from image captioning models trained on translated datasets, and that more robust translators produce more meaningful descriptions.

Utilizando Features Linguísticas Genéricas para Classificação de Triplas Relacionais em Português (Generic Linguistic Features for Relational Triples Classification in Portuguese)[In Portuguese]
George Barbosa | Daniela Barreiro Claro
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology