PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech

Michel Wong; Ali Alshehri; Sophia Kao; Haotian He

doi:10.18653/v1/2025.emnlp-industry.6

PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech

Michel Wong, Ali Alshehri, Sophia Kao, Haotian He

Abstract

Text Normalization (TN) is a key preprocessing step in Text-to-Speech (TTS) systems, converting written forms into their canonical spoken equivalents. Traditional TN systems can exhibit high accuracy, but involve substantial engineering effort, are difficult to scale, and pose challenges to language coverage, particularly in low-resource settings. We propose PolyNorm, a prompt-based approach to TN using Large Language Models (LLMs), aiming to reduce the reliance on manually crafted rules and enable broader linguistic applicability with minimal human intervention. Additionally, we present a language-agnostic pipeline for automatic data curation and evaluation, designed to facilitate scalable experimentation across diverse languages. Experiments across eight languages show consistent reductions in the word error rate (WER) compared to a production-grade-based system. To support further research, we release PolyNorm-Benchmark, a multilingual data set covering a diverse range of text normalization phenomena.

Anthology ID:: 2025.emnlp-industry.6
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 77–85
Language:
URL:: https://aclanthology.org/2025.emnlp-industry.6/
DOI:: 10.18653/v1/2025.emnlp-industry.6
Bibkey:
Cite (ACL):: Michel Wong, Ali Alshehri, Sophia Kao, and Haotian He. 2025. PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 77–85, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech (Wong et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-industry.6.pdf

PDF Cite Search Fix data