DLM: A Decoupled Learning Model for Long-tailed Polyphone Disambiguation in Mandarin

Beibei Gao, Yangsen Zhang, Ga Xiang, Yushan Jiang


Abstract
Grapheme-to-phoneme conversion (G2P) is a critical component of the text-to-speech system (TTS), where polyphone disambiguation is the most crucial task. However, polyphone disambiguation datasets often suffer from the long-tail problem, and context learning for polyphonic characters commonly stems from a single dimension. In this paper, we propose a novel model DLM: a Decoupled Learning Model for long-tailed polyphone disambiguation in Mandarin. Firstly, DLM decouples representation and classification learnings. It can apply different data samplers for each stage to obtain an optimal training data distribution. This can mitigate the long-tail problem. Secondly, two improved attention mechanisms and a gradual conversion strategy are integrated into the DLM, which achieve transition learning of context from local to global. Finally, to evaluate the effectiveness of DLM, we construct a balanced polyphone disambiguation corpus via in-context learning. Experiments on the benchmark CPP dataset demonstrate that DLM achieves a boosted accuracy of 99.07%. Moreover, DLM improves the disambiguation performance of long-tailed polyphonic characters. For many long-tailed characters, DLM even achieves an accuracy of 100%.
Anthology ID:
2024.naacl-long.294
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5252–5262
Language:
URL:
https://aclanthology.org/2024.naacl-long.294
DOI:
10.18653/v1/2024.naacl-long.294
Bibkey:
Cite (ACL):
Beibei Gao, Yangsen Zhang, Ga Xiang, and Yushan Jiang. 2024. DLM: A Decoupled Learning Model for Long-tailed Polyphone Disambiguation in Mandarin. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5252–5262, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
DLM: A Decoupled Learning Model for Long-tailed Polyphone Disambiguation in Mandarin (Gao et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.294.pdf