Certas Palavras: A 1980s-90s Brazilian Radio Corpus to Test TTS Models in Noisy Multi-Speaker Dialogues

Gustavo Evangelista Araújo; Moacir Ponti; Arnaldo Candido Junior; Sidney Leal; Edresson Casanova; Renato Moraes Silva; Miguel Oliveira Jr.; Adriana Barbosa Santos; Gustavo Wadas Lopes; Sandra Aluísio

Certas Palavras: A 1980s-90s Brazilian Radio Corpus to Test TTS Models in Noisy Multi-Speaker Dialogues

Gustavo Evangelista Araújo, Moacir Ponti, Arnaldo Candido Junior, Sidney Leal, Edresson Casanova, Renato Moraes Silva, Miguel Oliveira Jr., Adriana Barbosa Santos, Gustavo Wadas Lopes, Sandra Aluisio

Abstract

Robust text-to-speech (TTS) systems must be trained on speech that mirrors the variability and imperfections of spontaneous dialogues. However, TTS systems trained on existing Brazilian Portuguese datasets are typically limited to clean, scripted, or studio-recorded speech. Certas Palavras (CP) bridges this gap with 70 hours of spontaneous, multi-speaker dialogues from a Brazilian radio program broadcast in the 1980s–1990s. The extensive manual annotation process captures conversational dynamics, including orality markers, filled pauses, and hesitations. For the analog medium, we annotated non-verbal phenomena as musical interference, noise and segmental corrections, describing a challenging acoustic environment for synthesis. Baseline YourTTS and F5-TTS models were trained in a 9-hour subset featuring one of the two main hosts of Certas Palavras. Baseline YourTTS and F5-TTS models were trained on a 9-hour single-speaker subset corresponding to one of the main program hosts. Objective evaluation shows that the synthesized speech remains intelligible, with moderate WER and CER. In contrast, subjective evaluation reveals a clear gap in perceived naturalness, with lower MOS scores and higher inter-rater variability compared to ground-truth audio. Together, these properties make the dataset a strong benchmark for TTS robustness.

Anthology ID:: 2026.propor-1.81
Volume:: Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:: April
Year:: 2026
Address:: Salvador, Brazil
Editors:: Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:: PROPOR
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 819–829
Language:
URL:: https://aclanthology.org/2026.propor-1.81/
DOI:
Bibkey:
Cite (ACL):: Gustavo Evangelista Araújo, Moacir Ponti, Arnaldo Candido Junior, Sidney Leal, Edresson Casanova, Renato Moraes Silva, Miguel Oliveira Jr., Adriana Barbosa Santos, Gustavo Wadas Lopes, and Sandra Aluisio. 2026. Certas Palavras: A 1980s-90s Brazilian Radio Corpus to Test TTS Models in Noisy Multi-Speaker Dialogues. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 819–829, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):: Certas Palavras: A 1980s-90s Brazilian Radio Corpus to Test TTS Models in Noisy Multi-Speaker Dialogues (Araújo et al., PROPOR 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.propor-1.81.pdf

PDF Cite Search Fix data