Sidney Leal
2026
Certas Palavras: A 1980s-90s Brazilian Radio Corpus to Test TTS Models in Noisy Multi-Speaker Dialogues
Gustavo Evangelista Araújo | Moacir Ponti | Arnaldo Candido Junior | Sidney Leal | Edresson Casanova | Renato Moraes Silva | Miguel Oliveira Jr. | Adriana Barbosa Santos | Gustavo Wadas Lopes | Sandra Aluisio
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Gustavo Evangelista Araújo | Moacir Ponti | Arnaldo Candido Junior | Sidney Leal | Edresson Casanova | Renato Moraes Silva | Miguel Oliveira Jr. | Adriana Barbosa Santos | Gustavo Wadas Lopes | Sandra Aluisio
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Robust text-to-speech (TTS) systems must be trained on speech that mirrors the variability and imperfections of spontaneous dialogues. However, TTS systems trained on existing Brazilian Portuguese datasets are typically limited to clean, scripted, or studio-recorded speech. Certas Palavras (CP) bridges this gap with 70 hours of spontaneous, multi-speaker dialogues from a Brazilian radio program broadcast in the 1980s–1990s. The extensive manual annotation process captures conversational dynamics, including orality markers, filled pauses, and hesitations. For the analog medium, we annotated non-verbal phenomena as musical interference, noise and segmental corrections, describing a challenging acoustic environment for synthesis. Baseline YourTTS and F5-TTS models were trained in a 9-hour subset featuring one of the two main hosts of Certas Palavras. Baseline YourTTS and F5-TTS models were trained on a 9-hour single-speaker subset corresponding to one of the main program hosts. Objective evaluation shows that the synthesized speech remains intelligible, with moderate WER and CER. In contrast, subjective evaluation reveals a clear gap in perceived naturalness, with lower MOS scores and higher inter-rater variability compared to ground-truth audio. Together, these properties make the dataset a strong benchmark for TTS robustness.
2020
Measuring the Impact of Readability Features in Fake News Detection
Roney Santos | Gabriela Pedro | Sidney Leal | Oto Vale | Thiago Pardo | Kalina Bontcheva | Carolina Scarton
Proceedings of the Twelfth Language Resources and Evaluation Conference
Roney Santos | Gabriela Pedro | Sidney Leal | Oto Vale | Thiago Pardo | Kalina Bontcheva | Carolina Scarton
Proceedings of the Twelfth Language Resources and Evaluation Conference
The proliferation of fake news is a current issue that influences a number of important areas of society, such as politics, economy and health. In the Natural Language Processing area, recent initiatives tried to detect fake news in different ways, ranging from language-based approaches to content-based verification. In such approaches, the choice of the features for the classification of fake and true news is one of the most important parts of the process. This paper presents a study on the impact of readability features to detect fake news for the Brazilian Portuguese language. The results show that such features are relevant to the task (achieving, alone, up to 92% classification accuracy) and may improve previous classification results.