Chia-Chun Liang


2025

pdf bib
A Whisper-Based System with Multi-Faceted Data Augmentation for Low-Resource Language
Pin-Cheng Chen | Yu-Chi Chen | Chia-Chun Liang | Cheng-Yu Lin | Ping-Juei Tsai | Wei-Yun Ma
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

This paper presents a comprehensive approach for the Formosa Speech Recognition Challenge 2025 (FSR-2025), targeting automatic speech recognition (ASR) for the under-resourced Dapu and Zhao’an dialects of Taiwanese Hakka. Our method integrates data augmentation and robustness techniques, including SpecAugment, dialect-aware special tokens, text-to-speech (TTS) augmentation, noise/reverberation mixing, and speed perturbation, to mitigate data scarcity and domain mismatch. Experiments on the official FSR-2025 datasets show consistent improvements in both character error rate (CER) and word error rate (WER). Extensive ablation studies further confirm that each component contributes positively. These results offer a practical path toward robust ASR for under-resourced Hakka dialects and suggest broader applicability to other low-resource languages.