Learning to Pronounce Chinese Without a Pronunciation Dictionary

Christopher Chu, Scot Fang, Kevin Knight


Abstract
We demonstrate a program that learns to pronounce Chinese text in Mandarin, without a pronunciation dictionary. From non-parallel streams of Chinese characters and Chinese pinyin syllables, it establishes a many-to-many mapping between characters and pronunciations. Using unsupervised methods, the program effectively deciphers writing into speech. Its token-level character-to-syllable accuracy is 89%, which significantly exceeds the 22% accuracy of prior work.
Anthology ID:
2020.emnlp-main.458
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5687–5693
Language:
URL:
https://aclanthology.org/2020.emnlp-main.458
DOI:
10.18653/v1/2020.emnlp-main.458
Bibkey:
Cite (ACL):
Christopher Chu, Scot Fang, and Kevin Knight. 2020. Learning to Pronounce Chinese Without a Pronunciation Dictionary. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5687–5693, Online. Association for Computational Linguistics.
Cite (Informal):
Learning to Pronounce Chinese Without a Pronunciation Dictionary (Chu et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.458.pdf
Video:
 https://slideslive.com/38939068