One-Vs-Rest Neural Network English Grapheme Segmentation: A Linguistic Perspective

Samuel Rose, Nina Dethlefs, C. Kambhampati


Abstract
Grapheme-to-Phoneme (G2P) correspondences form foundational frameworks of tasks such as text-to-speech (TTS) synthesis or automatic speech recognition. The G2P process involves taking words in their written form and generating their pronunciation. In this paper, we critique the status quo definition of a grapheme, currently a forced alignment process relating a single character to either a phoneme or a blank unit, that underlies the majority of modern approaches. We develop a linguistically-motivated redefinition from simple concepts such as vowel and consonant count and word length and offer a proof-of-concept implementation based on a multi-binary neural classification task. Our model achieves state-of-the-art results with a 31.86% Word Error Rate on a standard benchmark, while generating linguistically meaningful grapheme segmentations.
Anthology ID:
2024.conll-1.36
Volume:
Proceedings of the 28th Conference on Computational Natural Language Learning
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Libby Barak, Malihe Alikhani
Venue:
CoNLL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
464–469
Language:
URL:
https://aclanthology.org/2024.conll-1.36
DOI:
Bibkey:
Cite (ACL):
Samuel Rose, Nina Dethlefs, and C. Kambhampati. 2024. One-Vs-Rest Neural Network English Grapheme Segmentation: A Linguistic Perspective. In Proceedings of the 28th Conference on Computational Natural Language Learning, pages 464–469, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
One-Vs-Rest Neural Network English Grapheme Segmentation: A Linguistic Perspective (Rose et al., CoNLL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.conll-1.36.pdf