Arnaldo Candido Junior
Also published as: Arnaldo Candido Junior
2026
Certas Palavras: A 1980s-90s Brazilian Radio Corpus to Test TTS Models in Noisy Multi-Speaker Dialogues
Gustavo Evangelista Araújo | Moacir Ponti | Arnaldo Candido Junior | Sidney Leal | Edresson Casanova | Renato Moraes Silva | Miguel Oliveira Jr. | Adriana Barbosa Santos | Gustavo Wadas Lopes | Sandra Aluisio
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Gustavo Evangelista Araújo | Moacir Ponti | Arnaldo Candido Junior | Sidney Leal | Edresson Casanova | Renato Moraes Silva | Miguel Oliveira Jr. | Adriana Barbosa Santos | Gustavo Wadas Lopes | Sandra Aluisio
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Robust text-to-speech (TTS) systems must be trained on speech that mirrors the variability and imperfections of spontaneous dialogues. However, TTS systems trained on existing Brazilian Portuguese datasets are typically limited to clean, scripted, or studio-recorded speech. Certas Palavras (CP) bridges this gap with 70 hours of spontaneous, multi-speaker dialogues from a Brazilian radio program broadcast in the 1980s–1990s. The extensive manual annotation process captures conversational dynamics, including orality markers, filled pauses, and hesitations. For the analog medium, we annotated non-verbal phenomena as musical interference, noise and segmental corrections, describing a challenging acoustic environment for synthesis. Baseline YourTTS and F5-TTS models were trained in a 9-hour subset featuring one of the two main hosts of Certas Palavras. Baseline YourTTS and F5-TTS models were trained on a 9-hour single-speaker subset corresponding to one of the main program hosts. Objective evaluation shows that the synthesized speech remains intelligible, with moderate WER and CER. In contrast, subjective evaluation reveals a clear gap in perceived naturalness, with lower MOS scores and higher inter-rater variability compared to ground-truth audio. Together, these properties make the dataset a strong benchmark for TTS robustness.
Síntese de Voz Emocional Multi-Idioma para Português Brasileiro: Uma Análise Comparativa de Abordagens de Ajuste Fino
Daniel Oliveira de Brito | Sidney Evaldo Leal | Arnaldo Candido Junior
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Daniel Oliveira de Brito | Sidney Evaldo Leal | Arnaldo Candido Junior
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
A síntese de voz emocional multi-idioma para português brasileiro é pouco explorada. Este trabalho investiga diferentes abordagens para incorporar controle emocional em síntese multi-idioma português-inglês, comparando cinco variantes: modelo base YourTTS, ajuste fino com dados emocionais, condicionamento via tokens textuais, e arquitetura VECL-TTS com embeddings emocionais sob diferentes configurações. Utilizamos datasets emocionais em inglês (RAVDESS, Emotional Speech Dataset) e português brasileiro (VERBO), totalizando 14,4 horas, para ajuste fino a partir do modelo YourTTS pré-treinado. A avaliação combinou métricas objetivas (similaridade de embeddings emocionais e de falante) com avaliação subjetiva por dez participantes. Os resultados revelam que abordagens arquiteturalmente simples podem alcançar desempenho perceptual comparável ou superior a métodos mais complexos: o YourTTS com ajuste fino obteve a melhor qualidade geral, o condicionamento por tokens alcançou a maior similaridade emocional percebida, enquanto o VECL-TTS maximizou o controle emocional objetivo com degradação na qualidade e na similaridade de falante. Observou-se ainda uma competição entre controle emocional e preservação de identidade vocal, bem como discrepâncias entre métricas objetivas e percepção humana. Este trabalho demonstra a viabilidade de transferência emocional multi-idioma para português brasileiro via ajuste fino com recursos limitados.
Contrastive and Adversarial Disentanglement for Speaker Representations in Brazilian Portuguese
Ariadne Nascimento Matos | Arnaldo Candido Junior | Moacir Antonelli Ponti
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Ariadne Nascimento Matos | Arnaldo Candido Junior | Moacir Antonelli Ponti
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
In this work, we study disentanglement between speaker and environment by combining an adversarial framework with contrastive learning objectives. We investigate supervised contrastive learning (SupCon), which exploits environment labels to structure the environment subspace, and self-supervised SimCLR, which learns invariance from augmented views. Experiments on a controlled synthetic dataset (ST1) and a more realistic corpus (CML-TTS) show that SupCon yields the most discriminative and stable speaker embeddings on ST1, achieving the best verification performance (EER=4.70%, MinDCF=0.24). Overall, our findings emphasize (i) the importance of synthetic benchmarks for diagnosing disentanglement under controlled factor variation and (ii) the effectiveness of combining contrastive and adversarial objectives to encourage speaker representations that are both discriminative and less sensitive to environmental factors.
2025
MuPe Life Stories Dataset: Spontaneous Speech in Brazilian Portuguese with a Case Study Evaluation on ASR Bias against Speakers Groups and Topic Modeling
Sidney Evaldo Leal | Arnaldo Candido Junior | Ricardo Marcacini | Edresson Casanova | Odilon Gonçalves | Anderson Silva Soares | Rodrigo Freitas Lima | Lucas Rafael Stefanel Gris | Sandra Aluísio
Proceedings of the 31st International Conference on Computational Linguistics
Sidney Evaldo Leal | Arnaldo Candido Junior | Ricardo Marcacini | Edresson Casanova | Odilon Gonçalves | Anderson Silva Soares | Rodrigo Freitas Lima | Lucas Rafael Stefanel Gris | Sandra Aluísio
Proceedings of the 31st International Conference on Computational Linguistics
Recently, several public datasets for automatic speech recognition (ASR) in Brazilian Portuguese (BP) have been released, improving ASR systems performance. However, these datasets lack diversity in terms of age groups, regional accents, and education levels. In this paper, we present a new publicly available dataset consisting of 289 life story interviews (365 hours), featuring a broad range of speakers varying in age, education, and regional accents. First, we demonstrated the presence of bias in current BP ASR models concerning education levels and age groups. Second, we showed that our dataset helps mitigate these biases. Additionally, an ASR model trained on our dataset performed better during evaluation on a diverse test set. Finally, the ASR model trained with our dataset was extrinsically evaluated through a topic modeling task that utilized the automatically transcribed output.
2024
Accent Classification is Challenging but Pre-training Helps: a case study with novel Brazilian Portuguese datasets
Ariadne Matos | Gustavo Araújo | Arnaldo Candido Junior | Moacir Ponti
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Ariadne Matos | Gustavo Araújo | Arnaldo Candido Junior | Moacir Ponti
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
2015
Portal Min@s: Uma Ferramenta Geral de Apoio ao Processamento de Córpus de Propósito Geral (Portal Min@s: A General Purpose Support Tool for Corpora Processing)
Arnaldo Candido Junior | Thiago Lima Vieira | Marcel Serikawa | Matheus Antonio Ribeiro Silva | Régis Zangirolami | Sandra Maria Aluísio
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
Arnaldo Candido Junior | Thiago Lima Vieira | Marcel Serikawa | Matheus Antonio Ribeiro Silva | Régis Zangirolami | Sandra Maria Aluísio
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
Search
Fix author
Co-authors
- Sandra Aluísio 3
- Edresson Casanova 2
- Sidney Evaldo Leal 2
- Moacir Ponti 2
- Gustavo Araújo 1
- Gustavo Evangelista Araújo 1
- Daniel Oliveira de Brito 1
- Rodrigo Freitas Lima 1
- Odilon Gonçalves 1
- Miguel Oliveira Jr. 1
- Sidney Leal 1
- Gustavo Wadas Lopes 1
- Ricardo Marcacini 1
- Ariadne Matos 1
- Ariadne Nascimento Matos 1
- Moacir Antonelli Ponti 1
- Adriana Barbosa Santos 1
- Marcel Serikawa 1
- Matheus Antonio Ribeiro Silva 1
- Renato Moraes Silva 1
- Anderson Silva Soares 1
- Lucas Rafael Stefanel Gris 1
- Thiago Lima Vieira 1
- Régis Zangirolami 1