The AS-SLAM system for Formosa Speech Recognition Challenge 2025

Chih-Hsi Chen, Pei-Jun Liao, Chia-Hua Wu, Pang-Cheng Wu, Hsin-Min Wang


Abstract
In recent years, large-scale pre-trained speech models such as Whisper have been widely applied to speech recognition. While they achieve strong performance on high-resource languages such as English and Mandarin, dialects and other low-resource languages remain challenging due to limited data availability. The government-led “Formosa Speech in the Wild (FSW) project” is an important cultural preservation initiative for Hakka, a regional dialect, where the development of Hakka ASR systems represents a key technological milestone. Beyond model architecture, data processing and training strategies are also critical. In this paper, we explore data augmentation techniques for Hakka speech, including TTS and MUSAN-based approaches, and analyze different data combinations by fine-tuning the pre-trained Whisper model. We participated in the 2025 Hakka FSR ASR competition (student track) for the Dapu and Zhaoan varieties. In the pilot test, our system achieved 7th place in Hanzi recognition (CER: 15.92) and 3rd place in Pinyin recognition (SER: 20.49). In the official finals, our system ranked 6 in Hanzi recognition (CER: 15.73) and 4 in Pinyin recognition (SER: 20.68). We believe that such data augmentation strategies can advance research on Hakka ASR and support the long-term preservation of Hakka culture.
Anthology ID:
2025.rocling-main.61
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
504–511
Language:
URL:
https://aclanthology.org/2025.rocling-main.61/
DOI:
Bibkey:
Cite (ACL):
Chih-Hsi Chen, Pei-Jun Liao, Chia-Hua Wu, Pang-Cheng Wu, and Hsin-Min Wang. 2025. The AS-SLAM system for Formosa Speech Recognition Challenge 2025. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 504–511, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
The AS-SLAM system for Formosa Speech Recognition Challenge 2025 (Chen et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.61.pdf