Towards Accurate Text Verbalization for ASR Based on Audio Alignment

Diana Geneva, Georgi Shopov


Abstract
Verbalization of non-lexical linguistic units plays an important role in language modeling for automatic speech recognition systems. Most verbalization methods require valuable resources such as ground truth, large training corpus and expert knowledge which are often unavailable. On the other hand a considerable amount of audio data along with its transcribed text are freely available on the Internet and could be utilized for the task of verbalization. This paper presents a methodology for accurate verbalization of audio transcriptions based on phone-level alignment between the transcriptions and their corresponding audio recordings. Comparing this approach to a more general rule-based verbalization method shows a significant improvement in ASR recognition of non-lexical units. In the process of evaluating this approach we also expose the indirect influence of verbalization accuracy on the quality of acoustic models trained on automatically derived speech corpora.
Anthology ID:
R19-2007
Volume:
Proceedings of the Student Research Workshop Associated with RANLP 2019
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
39–47
Language:
URL:
https://aclanthology.org/R19-2007
DOI:
10.26615/issn.2603-2821.2019_007
Bibkey:
Cite (ACL):
Diana Geneva and Georgi Shopov. 2019. Towards Accurate Text Verbalization for ASR Based on Audio Alignment. In Proceedings of the Student Research Workshop Associated with RANLP 2019, pages 39–47, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Towards Accurate Text Verbalization for ASR Based on Audio Alignment (Geneva & Shopov, RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-2007.pdf