Wei-Cheng Hong
2025
Applying Whisper Fine-tuning and Branchformer to Hakka Speech Recognition
Yu-Sheng Huang
|
Wei-Cheng Hong
|
Xin-Yu Chen
|
Szu-Yin Lin
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
This study addresses the FSR 2025 Hakka speech recognition task by comparing two strategies: fine-tuning large pre-trained models and training from scratch. For character (Hanzi) recognition, we fine-tuned five different scales of the Whisper model, with large-v3-turbo achieving a 7.55% CER on the test set. For Pinyin recognition, a Branchformer model was compared against a LoRA fine-tuned Whisper-small, yielding WERs of 4.7% and 6.5% on the test set, respectively. Speed perturbation was the primary method used for data augmentation in our pre-processing pipeline.