Yu-Chi Chen
2025
A Whisper-Based System with Multi-Faceted Data Augmentation for Low-Resource Language
Pin-Cheng Chen
|
Yu-Chi Chen
|
Chia-Chun Liang
|
Cheng-Yu Lin
|
Ping-Juei Tsai
|
Wei-Yun Ma
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
This paper presents a comprehensive approach for the Formosa Speech Recognition Challenge 2025 (FSR-2025), targeting automatic speech recognition (ASR) for the under-resourced Dapu and Zhao’an dialects of Taiwanese Hakka. Our method integrates data augmentation and robustness techniques, including SpecAugment, dialect-aware special tokens, text-to-speech (TTS) augmentation, noise/reverberation mixing, and speed perturbation, to mitigate data scarcity and domain mismatch. Experiments on the official FSR-2025 datasets show consistent improvements in both character error rate (CER) and word error rate (WER). Extensive ablation studies further confirm that each component contributes positively. These results offer a practical path toward robust ASR for under-resourced Hakka dialects and suggest broader applicability to other low-resource languages.