Adversarial Text-to-Speech for low-resource languages

Ashraf Elneima, Mikołaj Bińkowski


Abstract
In this paper we propose a new method for training adversarial text-to-speech (TTS) models for low-resource languages using auxiliary data. Specifically, we modify the MelGAN (Kumar et al., 2019) architecture to achieve better performance in Arabic speech generation, exploring multiple additional datasets and architectural choices, which involved extra discriminators designed to exploit high-frequency similarities between languages. In our evaluation, we used subjective human evaluation, MOS-Mean Opinion Score, and a novel quantitative metric, the Fréchet Wav2Vec Distance, which we found to be well correlated with MOS. Both subjectively and quantitatively, our method outperformed the standard MelGAN model.
Anthology ID:
2022.wanlp-1.8
Volume:
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
76–84
Language:
URL:
https://aclanthology.org/2022.wanlp-1.8
DOI:
10.18653/v1/2022.wanlp-1.8
Bibkey:
Cite (ACL):
Ashraf Elneima and Mikołaj Bińkowski. 2022. Adversarial Text-to-Speech for low-resource languages. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 76–84, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Adversarial Text-to-Speech for low-resource languages (Elneima & Bińkowski, WANLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wanlp-1.8.pdf