Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages

James Route; Steven Hillis; Isak Czeresnia Etinger; Han Zhang; Alan W. Black

doi:10.18653/v1/D19-6121

Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages

James Route, Steven Hillis, Isak Czeresnia Etinger, Han Zhang, Alan W Black

Abstract

Grapheme-to-phoneme conversion (g2p) is the task of predicting the pronunciation of words from their orthographic representation. His- torically, g2p systems were transition- or rule- based, making generalization beyond a mono- lingual (high resource) domain impractical. Recently, neural architectures have enabled multilingual systems to generalize widely; however, all systems to date have been trained only on spelling-pronunciation pairs. We hy- pothesize that the sequences of IPA characters used to represent pronunciation do not capture its full nuance, especially when cleaned to fa- cilitate machine learning. We leverage audio data as an auxiliary modality in a multi-task training process to learn a more optimal inter- mediate representation of source graphemes; this is the first multimodal model proposed for multilingual g2p. Our approach is highly ef- fective: on our in-domain test set, our mul- timodal model reduces phoneme error rate to 2.46%, a more than 65% decrease compared to our implementation of a unimodal spelling- pronunciation model—which itself achieves state-of-the-art results on the Wiktionary test set. The advantages of the multimodal model generalize to wholly unseen languages, reduc- ing phoneme error rate on our out-of-domain test set to 6.39% from the unimodal 8.21%, a more than 20% relative decrease. Further- more, our training and test sets are composed primarily of low-resource languages, demon- strating that our multimodal approach remains useful when training data are constrained.

Anthology ID:: D19-6121
Volume:: Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Colin Cherry, Greg Durrett, George Foster, Reza Haffari, Shahram Khadivi, Nanyun Peng, Xiang Ren, Swabha Swayamdipta
Venue:: WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 192–201
Language:
URL:: https://aclanthology.org/D19-6121/
DOI:: 10.18653/v1/D19-6121
Bibkey:
Cite (ACL):: James Route, Steven Hillis, Isak Czeresnia Etinger, Han Zhang, and Alan W Black. 2019. Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 192–201, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages (Route et al., 2019)
Copy Citation:
PDF:: https://aclanthology.org/D19-6121.pdf

PDF Cite Search Fix data