Szu-Yin Lin


2025

pdf bib
Applying Whisper Fine-tuning and Branchformer to Hakka Speech Recognition
Yu-Sheng Huang | Wei-Cheng Hong | Xin-Yu Chen | Szu-Yin Lin
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

This study addresses the FSR 2025 Hakka speech recognition task by comparing two strategies: fine-tuning large pre-trained models and training from scratch. For character (Hanzi) recognition, we fine-tuned five different scales of the Whisper model, with large-v3-turbo achieving a 7.55% CER on the test set. For Pinyin recognition, a Branchformer model was compared against a LoRA fine-tuned Whisper-small, yielding WERs of 4.7% and 6.5% on the test set, respectively. Speed perturbation was the primary method used for data augmentation in our pre-processing pipeline.