Multimodal neural pronunciation modeling for spoken languages with logographic origin

Minh Nguyen, Gia H. Ngo, Nancy Chen


Abstract
Graphemes of most languages encode pronunciation, though some are more explicit than others. Languages like Spanish have a straightforward mapping between its graphemes and phonemes, while this mapping is more convoluted for languages like English. Spoken languages such as Cantonese present even more challenges in pronunciation modeling: (1) they do not have a standard written form, (2) the closest graphemic origins are logographic Han characters, of which only a subset of these logographic characters implicitly encodes pronunciation. In this work, we propose a multimodal approach to predict the pronunciation of Cantonese logographic characters, using neural networks with a geometric representation of logographs and pronunciation of cognates in historically related languages. The proposed framework improves performance by 18.1% and 25.0% respective to unimodal and multimodal baselines.
Anthology ID:
D18-1320
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2916–2922
Language:
URL:
https://aclanthology.org/D18-1320
DOI:
10.18653/v1/D18-1320
Bibkey:
Cite (ACL):
Minh Nguyen, Gia H. Ngo, and Nancy Chen. 2018. Multimodal neural pronunciation modeling for spoken languages with logographic origin. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2916–2922, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Multimodal neural pronunciation modeling for spoken languages with logographic origin (Nguyen et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1320.pdf
Attachment:
 D18-1320.Attachment.zip
Code
 mnhng/logographic