LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models

Ahmed Khamis; Hesham Ali Ahmed

LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models

Abstract

Despite the advances in neural text to speech (TTS), many Arabic dialectal varieties remain marginally addressed, with most resources con- centrated on Modern Spoken Arabic (MSA) and Gulf dialects, leaving Egyptian Arabic— the most widely understood Arabic dialect— severely under-resourced. We address this gap by introducing NileTTS: 38 hours of tran- scribed speech from two speakers across di- verse domains including medical, sales, and general conversations. We construct this dataset using a novel synthetic pipeline: large language models (LLM) generate Egyptian Arabic content, which is then converted to natu- ral speech using audio synthesis tools, followed by automatic transcription and speaker diariza- tion with manual quality verification. We fine- tune XTTS v2, a state-of-the-art multilingual TTS model, on our dataset and evaluate against the baseline model trained on other Arabic dialects. Our contributions include: (1) the first publicly available Egyptian Arabic TTS dataset, (2) a reproducible synthetic data gen- eration pipeline for dialectal TTS, and (3) an open-source fine-tuned model. All resources are released to advance Egyptian Arabic speech synthesis research.

Anthology ID:: 2026.abjadnlp-1.6
Volume:: Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Venues:: AbjadNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 47–54
Language:
URL:: https://aclanthology.org/2026.abjadnlp-1.6/
DOI:
Bibkey:
Cite (ACL):: Ahmed Khamis and Hesham Ali Ahmed. 2026. LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 47–54, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models (Khamis & Ahmed, AbjadNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.abjadnlp-1.6.pdf

PDF Cite Search Fix data