Text-to-Speech for Under-Resourced Languages: Phoneme Mapping and Source Language Selection in Transfer Learning

Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers


Abstract
We propose a new approach for phoneme mapping in cross-lingual transfer learning for text-to-speech (TTS) in under-resourced languages (URLs), using phonological features from the PHOIBLE database and a language-independent mapping rule. This approach was validated through our experiment, in which we pre-trained acoustic models in Dutch, Finnish, French, Japanese, and Spanish, and fine-tuned them with 30 minutes of Frisian training data. The experiment showed an improvement in both naturalness and pronunciation accuracy in the synthesized Frisian speech when our mapping approach was used. Since this improvement also depended on the source language, we then experimented on finding a good criterion for selecting source languages. As an alternative to the traditionally used language family criterion, we tested a novel idea of using Angular Similarity of Phoneme Frequencies (ASPF), which measures the similarity between the phoneme systems of two languages. ASPF was empirically confirmed to be more effective than language family as a criterion for source language selection, and also to affect the phoneme mapping’s effectiveness. Thus, a combination of our phoneme mapping approach and the ASPF measure can be beneficially adopted by other studies involving multilingual or cross-lingual TTS for URLs.
Anthology ID:
2022.sigul-1.3
Volume:
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Maite Melero, Sakriani Sakti, Claudia Soria
Venue:
SIGUL
SIG:
SIGUL
Publisher:
European Language Resources Association
Note:
Pages:
16–22
Language:
URL:
https://aclanthology.org/2022.sigul-1.3
DOI:
Bibkey:
Cite (ACL):
Phat Do, Matt Coler, Jelske Dijkstra, and Esther Klabbers. 2022. Text-to-Speech for Under-Resourced Languages: Phoneme Mapping and Source Language Selection in Transfer Learning. In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, pages 16–22, Marseille, France. European Language Resources Association.
Cite (Informal):
Text-to-Speech for Under-Resourced Languages: Phoneme Mapping and Source Language Selection in Transfer Learning (Do et al., SIGUL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sigul-1.3.pdf