Anderson Da Silva Soares

Also published as: Anderson da Silva Soares


2025

pdf bib
BRSpeech-DF: A Deep Fake Synthetic Speech Dataset for Portuguese Zero-Shot TTS
Alexandre Costa Ferro Filho | Rafaello Virgilli | Lucas Alcantara Souza | F S de Oliveira | Marcelo Henrique Lopes Ferreira | Daniel Tunnermann | Gustavo Dos Reis Oliveira | Anderson Da Silva Soares | Arlindo Rodrigues Galvão Filho
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The detection of audio deepfakes (ADD) has become increasingly important due to the rapid evolution of generative speech models. However, progress in this field remains uneven across languages, particularly for low-resource languages like Portuguese, which lack high-quality datasets. In this paper, we introduce BRSpeech-DF, the first publicly available ADD dataset for Portuguese, encompassing both Brazilian and European variants. The dataset contains over 458,000 utterances, including a smaller portion of real speech from 62 speakers and a large collection of synthetic samples generated using multiple zero-shot text-to-speech (TTS) models, each conditioned on the original speaker’s voice. By providing this resource, our objective is to support the development of robust, multilingual detection systems, thereby advancing equity in speech forensics and security research. BRSpeech-DF addresses a significant gap in annotated data for underrepresented languages, facilitating more inclusive and generalizable advancements in synthetic speech detection.

2020

pdf bib
Deep Learning Brasil - NLP at SemEval-2020 Task 9: Sentiment Analysis of Code-Mixed Tweets Using Ensemble of Language Models
Manoel Veríssimo dos Santos Neto | Ayrton Amaral | Nádia Silva | Anderson da Silva Soares
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we describe a methodology to predict sentiment in code-mixed tweets (hindi-english). Our team called verissimo.manoel in CodaLab developed an approach based on an ensemble of four models (MultiFiT, BERT, ALBERT, and XLNET). The final classification algorithm was an ensemble of some predictions of all softmax values from these four models. This architecture was used and evaluated in the context of the SemEval 2020 challenge (task 9), and our system got 72.7% on the F1 score.