Speech-Driven Editing System for Chinese ASR Errors

Sji-Jie Ding, Chia-Hui Chang, Zi-Xuan Jian


Abstract
Despite recent advances in AI, ASR systems still struggle with real-world errors from pronunciation and homophones. To solve this issue, we propose a verbal-command-based correction system that enables users to utter natural-language instructions to refine recognition outputs with minimal effort. The system consists of three modules: an input classifier, a command classifier, and a correction labeler. To support training and evaluation, we simulate ASR errors via TTS and ASR pipelines to simulate the potential errors, followed by verbal correction commands issued based on linguistic features or LLMs. Experiments show that the overall system achieves over 80% correction accuracy and delivers stable performance. Compared to manual correction, this system also demonstrates highly competitive correction speed, which sufficiently indicates its feasibility for practical deployment.
Anthology ID:
2025.rocling-main.29
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
277–285
Language:
URL:
https://aclanthology.org/2025.rocling-main.29/
DOI:
Bibkey:
Cite (ACL):
Sji-Jie Ding, Chia-Hui Chang, and Zi-Xuan Jian. 2025. Speech-Driven Editing System for Chinese ASR Errors. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 277–285, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Speech-Driven Editing System for Chinese ASR Errors (Ding et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.29.pdf