Challenges and Limitations of the Multilingual Pre-trained Model Whisper on Low-Resource Languages: A Case Study of Hakka Speech Recognition

Pei-Chi Lan, Hsin-Tien Chiang, Ting-Chun Lin, Ming-Hsiang Su


Abstract
This study investigates the practical performance and limitations of the multilingual pre-trained model Whisper in low-resource language settings, using a Hakka speech recognition challenge as a case study. In the preliminary phase, our team (Group G) achieved official scores of 75.58% in Character Error Rate (CER) and 100.97% in Syllable Error Rate (SER). However, in the final phase, both CER and Word Error Rate (WER) reached 100%. Through a retrospective analysis of system design and implementation, we identified three major sources of failure: (1) improper handling of long utterances, where only the first segment was decoded, causing content truncation; (2) inconsistent language prompting, fixed to “Chinese” instead of the Hakka target; and (3) lack of systematic verification in data alignment and submission generation, combined with inadequate evaluation setup.Based on these findings, we propose a set of practical guidelines covering long-utterance processing, language consistency checking, and data submission validation. The results highlight that in low-resource speech recognition tasks, poor data quality or flawed workflow design can cause severe degradation of model performance. This study underscores the importance of robust data and process management in ASR system development and provides concrete insights for future improvements and reproducibility.
Anthology ID:
2025.rocling-main.62
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
512–517
Language:
URL:
https://aclanthology.org/2025.rocling-main.62/
DOI:
Bibkey:
Cite (ACL):
Pei-Chi Lan, Hsin-Tien Chiang, Ting-Chun Lin, and Ming-Hsiang Su. 2025. Challenges and Limitations of the Multilingual Pre-trained Model Whisper on Low-Resource Languages: A Case Study of Hakka Speech Recognition. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 512–517, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Challenges and Limitations of the Multilingual Pre-trained Model Whisper on Low-Resource Languages: A Case Study of Hakka Speech Recognition (Lan et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.62.pdf