Sji-Jie Ding
2025
Speech-Driven Editing System for Chinese ASR Errors
Sji-Jie Ding
|
Chia-Hui Chang
|
Zi-Xuan Jian
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Despite recent advances in AI, ASR systems still struggle with real-world errors from pronunciation and homophones. To solve this issue, we propose a verbal-command-based correction system that enables users to utter natural-language instructions to refine recognition outputs with minimal effort. The system consists of three modules: an input classifier, a command classifier, and a correction labeler. To support training and evaluation, we simulate ASR errors via TTS and ASR pipelines to simulate the potential errors, followed by verbal correction commands issued based on linguistic features or LLMs. Experiments show that the overall system achieves over 80% correction accuracy and delivers stable performance. Compared to manual correction, this system also demonstrates highly competitive correction speed, which sufficiently indicates its feasibility for practical deployment.