Michał J. Ołowski


2025

pdf bib
Voice synthesis in Polish and English - analyzing prediction differences in speaker verification systems
Joanna Gajewska | Alicja Martinek | Michał J. Ołowski | Ewelina Bartuzi-Trokielewicz
Proceedings of the 31st International Conference on Computational Linguistics

Deep learning has significantly enhanced voice synthesis, yielding realistic audio capable of mimicking individual voices. This progress, however, raises security concerns due to the potential misuse of audio deepfakes. Our research examines the effects of deepfakes on speaker recognition systems across English and Polish corpora, assessing both Text-to-Speech and Voice Conversion methods. We focus on the biometric similarity’s role in the effectiveness of impersonations and find that synthetic voices can maintain personal traits, posing risks of unauthorized access. The study’s key contributions include analyzing voice synthesis across languages, evaluating biometric resemblance in voice conversion, and contrasting Text-to-Speech and Voice Conversion paradigms. These insights emphasize the need for improved biometric security against audio deepfake threats.