Moacir Ponti
2026
Certas Palavras: A 1980s-90s Brazilian Radio Corpus to Test TTS Models in Noisy Multi-Speaker Dialogues
Gustavo Evangelista Araújo | Moacir Ponti | Arnaldo Candido Junior | Sidney Leal | Edresson Casanova | Renato Moraes Silva | Miguel Oliveira Jr. | Adriana Barbosa Santos | Gustavo Wadas Lopes | Sandra Aluisio
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Gustavo Evangelista Araújo | Moacir Ponti | Arnaldo Candido Junior | Sidney Leal | Edresson Casanova | Renato Moraes Silva | Miguel Oliveira Jr. | Adriana Barbosa Santos | Gustavo Wadas Lopes | Sandra Aluisio
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Robust text-to-speech (TTS) systems must be trained on speech that mirrors the variability and imperfections of spontaneous dialogues. However, TTS systems trained on existing Brazilian Portuguese datasets are typically limited to clean, scripted, or studio-recorded speech. Certas Palavras (CP) bridges this gap with 70 hours of spontaneous, multi-speaker dialogues from a Brazilian radio program broadcast in the 1980s–1990s. The extensive manual annotation process captures conversational dynamics, including orality markers, filled pauses, and hesitations. For the analog medium, we annotated non-verbal phenomena as musical interference, noise and segmental corrections, describing a challenging acoustic environment for synthesis. Baseline YourTTS and F5-TTS models were trained in a 9-hour subset featuring one of the two main hosts of Certas Palavras. Baseline YourTTS and F5-TTS models were trained on a 9-hour single-speaker subset corresponding to one of the main program hosts. Objective evaluation shows that the synthesized speech remains intelligible, with moderate WER and CER. In contrast, subjective evaluation reveals a clear gap in perceived naturalness, with lower MOS scores and higher inter-rater variability compared to ground-truth audio. Together, these properties make the dataset a strong benchmark for TTS robustness.
2024
Accent Classification is Challenging but Pre-training Helps: a case study with novel Brazilian Portuguese datasets
Ariadne Matos | Gustavo Araújo | Arnaldo Candido Junior | Moacir Ponti
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Ariadne Matos | Gustavo Araújo | Arnaldo Candido Junior | Moacir Ponti
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
EyetrackingMOS: Proposta de um método rápido e barato para avaliar modelos de sintese de fala
Gustavo Araújo | Julio Cesar Galdino | Rodrigo Lima | Sandra Aluísio | Arnaldo Candido Jr. | Miguel Oliveira Jr. | Moacir Ponti
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology
Gustavo Araújo | Julio Cesar Galdino | Rodrigo Lima | Sandra Aluísio | Arnaldo Candido Jr. | Miguel Oliveira Jr. | Moacir Ponti
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology