TunArTTS: Tunisian Arabic Text-To-Speech Corpus

Imen Laouirine; Rami Kammoun; Fethi Bougares

TunArTTS: Tunisian Arabic Text-To-Speech Corpus

Imen Laouirine, Rami Kammoun, Fethi Bougares

Abstract

Being labeled as a low-resource language, the Tunisian dialect has no existing prior TTS research. In this paper, we present a speech corpus for Tunisian Arabic Text-to-Speech (TunArTTS) to initiate the development of end-to-end TTS systems for the Tunisian dialect. Our Speech corpus is extracted from an online English and Tunisian Arabic dictionary. We were able to extract a mono-speaker speech corpus of +3 hours of a male speaker sampled at 44100 kHz. The corpus is processed and manually diacritized. Furthermore, we develop various TTS systems based on two approaches: training from scratch and transfer learning. Both Tacotron2 and FastSpeech2 were used and evaluated using subjective and objective metrics. The experimental results show that our best results are obtained with the transfer learning from a pre-trained model on the English LJSpeech dataset. This model obtained a mean opinion score (MOS) of 3.88. TunArTTS will be publicly available for research purposes along with the baseline TTS system demo. Keywords: Tunisian Dialect, Text-To-Speech, Low-resource, Transfer Learning, TunArTTS

Anthology ID:: 2024.lrec-main.1467
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 16879–16889
Language:
URL:: https://aclanthology.org/2024.lrec-main.1467/
DOI:
Bibkey:
Cite (ACL):: Imen Laouirine, Rami Kammoun, and Fethi Bougares. 2024. TunArTTS: Tunisian Arabic Text-To-Speech Corpus. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16879–16889, Torino, Italia. ELRA and ICCL.
Cite (Informal):: TunArTTS: Tunisian Arabic Text-To-Speech Corpus (Laouirine et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.1467.pdf

PDF Cite Search Fix data