Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages

James Route, Steven Hillis, Isak Czeresnia Etinger, Han Zhang, Alan W Black


Abstract
Grapheme-to-phoneme conversion (g2p) is the task of predicting the pronunciation of words from their orthographic representation. His- torically, g2p systems were transition- or rule- based, making generalization beyond a mono- lingual (high resource) domain impractical. Recently, neural architectures have enabled multilingual systems to generalize widely; however, all systems to date have been trained only on spelling-pronunciation pairs. We hy- pothesize that the sequences of IPA characters used to represent pronunciation do not capture its full nuance, especially when cleaned to fa- cilitate machine learning. We leverage audio data as an auxiliary modality in a multi-task training process to learn a more optimal inter- mediate representation of source graphemes; this is the first multimodal model proposed for multilingual g2p. Our approach is highly ef- fective: on our in-domain test set, our mul- timodal model reduces phoneme error rate to 2.46%, a more than 65% decrease compared to our implementation of a unimodal spelling- pronunciation model—which itself achieves state-of-the-art results on the Wiktionary test set. The advantages of the multimodal model generalize to wholly unseen languages, reduc- ing phoneme error rate on our out-of-domain test set to 6.39% from the unimodal 8.21%, a more than 20% relative decrease. Further- more, our training and test sets are composed primarily of low-resource languages, demon- strating that our multimodal approach remains useful when training data are constrained.
Anthology ID:
D19-6121
Volume:
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Colin Cherry, Greg Durrett, George Foster, Reza Haffari, Shahram Khadivi, Nanyun Peng, Xiang Ren, Swabha Swayamdipta
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
192–201
Language:
URL:
https://aclanthology.org/D19-6121/
DOI:
10.18653/v1/D19-6121
Bibkey:
Cite (ACL):
James Route, Steven Hillis, Isak Czeresnia Etinger, Han Zhang, and Alan W Black. 2019. Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 192–201, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages (Route et al., 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-6121.pdf