Low-resource grapheme-to-phoneme mapping with phonetically-conditioned transfer

Michael Hammond


Abstract
In this paper we explore a very simple nonneural approach to mapping orthography to phonetic transcription in a low-resource context with transfer data from a related language. We start from a baseline system and focus our efforts on data augmentation. We make three principal moves. First, we start with an HMMbased system (Novak et al., 2012). Second, we augment our basic system by recombining legal substrings in restricted fashion (Ryan and Hulden, 2020). Finally, we limit our transfer data by only using training pairs where the phonetic form shares all bigrams with the target language.
Anthology ID:
2023.sigmorphon-1.29
Volume:
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Garrett Nicolai, Eleanor Chodroff, Frederic Mailhot, Çağrı Çöltekin
Venue:
SIGMORPHON
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
245–248
Language:
URL:
https://aclanthology.org/2023.sigmorphon-1.29
DOI:
10.18653/v1/2023.sigmorphon-1.29
Bibkey:
Cite (ACL):
Michael Hammond. 2023. Low-resource grapheme-to-phoneme mapping with phonetically-conditioned transfer. In Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 245–248, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Low-resource grapheme-to-phoneme mapping with phonetically-conditioned transfer (Hammond, SIGMORPHON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.sigmorphon-1.29.pdf