Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi

Aryaman Arora, Luke Gessler, Nathan Schneider


Abstract
Hindi grapheme-to-phoneme (G2P) conversion is mostly trivial, with one exception: whether a schwa represented in the orthography is pronounced or unpronounced (deleted). Previous work has attempted to predict schwa deletion in a rule-based fashion using prosodic or phonetic analysis. We present the first statistical schwa deletion classifier for Hindi, which relies solely on the orthography as the input and outperforms previous approaches. We trained our model on a newly-compiled pronunciation lexicon extracted from various online dictionaries. Our best Hindi model achieves state of the art performance, and also achieves good performance on a closely related language, Punjabi, without modification.
Anthology ID:
2020.acl-main.696
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7791–7795
Language:
URL:
https://aclanthology.org/2020.acl-main.696
DOI:
10.18653/v1/2020.acl-main.696
Bibkey:
Cite (ACL):
Aryaman Arora, Luke Gessler, and Nathan Schneider. 2020. Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7791–7795, Online. Association for Computational Linguistics.
Cite (Informal):
Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi (Arora et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.696.pdf
Video:
 http://slideslive.com/38929177
Code
 aryamanarora/schwa-deletion