Applying Whisper Fine-tuning and Branchformer to Hakka Speech Recognition

Yu-Sheng Huang, Wei-Cheng Hong, Xin-Yu Chen, Szu-Yin Lin


Abstract
This study addresses the FSR 2025 Hakka speech recognition task by comparing two strategies: fine-tuning large pre-trained models and training from scratch. For character (Hanzi) recognition, we fine-tuned five different scales of the Whisper model, with large-v3-turbo achieving a 7.55% CER on the test set. For Pinyin recognition, a Branchformer model was compared against a LoRA fine-tuned Whisper-small, yielding WERs of 4.7% and 6.5% on the test set, respectively. Speed perturbation was the primary method used for data augmentation in our pre-processing pipeline.
Anthology ID:
2025.rocling-main.50
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
441–445
Language:
URL:
https://aclanthology.org/2025.rocling-main.50/
DOI:
Bibkey:
Cite (ACL):
Yu-Sheng Huang, Wei-Cheng Hong, Xin-Yu Chen, and Szu-Yin Lin. 2025. Applying Whisper Fine-tuning and Branchformer to Hakka Speech Recognition. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 441–445, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Applying Whisper Fine-tuning and Branchformer to Hakka Speech Recognition (Huang et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.50.pdf