GE2PE: Persian End-to-End Grapheme-to-Phoneme Conversion

Elnaz Rahmati, Hossein Sameti


Abstract
Text-to-Speech (TTS) systems have made significant strides, enabling the generation of speech from grapheme sequences. However, for low-resource languages, these models still struggle to produce natural and intelligible speech. Grapheme-to-Phoneme conversion (G2P) addresses this challenge by enhancing the input sequence with phonetic information. Despite these advancements, existing G2P systems face limitations when dealing with Persian texts due to the complexity of Persian transcription. In this study, we focus on enriching resources for the Persian language. To achieve this, we introduce two novel G2P training datasets: one manually labeled and the other machine-generated. These datasets comprise over five million sentences alongside their corresponding phoneme sequences. Additionally, we propose two evaluation datasets tailored for Persian sub-tasks, including Kasre-Ezafe detection, homograph disambiguation, and handling out-of-vocabulary (OOV) words. To tackle the unique challenges of the Persian language, we develop a new sentence-level End-to-End (E2E) model leveraging a two-step training approach, as outlined in our paper, to maximize the impact of manually labeled data. The results show that our model surpasses the state-of-the-art performance by 1.86% in word error rate, 4.03% in Kasre-Ezafe detection recall, and 3.42% in homograph disambiguation accuracy.
Anthology ID:
2024.findings-emnlp.196
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3426–3436
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.196/
DOI:
10.18653/v1/2024.findings-emnlp.196
Bibkey:
Cite (ACL):
Elnaz Rahmati and Hossein Sameti. 2024. GE2PE: Persian End-to-End Grapheme-to-Phoneme Conversion. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 3426–3436, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
GE2PE: Persian End-to-End Grapheme-to-Phoneme Conversion (Rahmati & Sameti, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.196.pdf
Software:
 2024.findings-emnlp.196.software.zip
Data:
 2024.findings-emnlp.196.data.zip