The NPTU ASR System for FSR2025 Hakka Character/Pinyin Recognition: Whisper with mBART Post-Editing and RNNLM Rescoring

Yi-Chin Huang; Yu-Heng Chen; Jian-Hua Wang; Hsiu-Chi Wu; Chih-Chung Kuo; Chao-Shih Huang; Yuan-Fu Liao

The NPTU ASR System for FSR2025 Hakka Character/Pinyin Recognition: Whisper with mBART Post-Editing and RNNLM Rescoring

Yi-Chin Huang, Yu-Heng Chen, Jian-Hua Wang, Hsiu-Chi Wu, Chih-Chung Kuo, Chao-Shih Huang, Yuan-Fu Liao

Abstract

This paper presents our system for the FSR-2025 Hakka Automatic Speech Recognition (ASR) Challenge, which consists of two sub-tasks: (i) Hakka Characters and (ii) Hakka Pinyin. We propose a unified architecture built upon Whisper [1], a large weakly supervised ASR model, as the acoustic backbone, with optional LoRA (Low-Rank Adaptation [2]) for parameter-efficient fine-tuning. Data augmentation techniques include the MUSAN [3] corpus (music/speech/noise) and tempo/speed perturbation [4]. For the character task, mBART-50 [5,6], a multilingual sequence-to-sequence model, is applied for text correction, while both tasks employ an RNNLM [7] for N-best rescoring. Under the final evaluation setting of the character task, mBART-driven 10-best text correction combined with RNNLM rescoring achieved a CER (Character Error Rate) of 6.26%, whereas the official leaderboard reported 22.5%. For the Pinyin task, the Medium model proved more suitable than the Large model given the dataset size and accent distribution. With 10-best RNNLM rescoring, it achieved a SER (Syllable Error Rate) of 4.65% on our internal warm-up test set, and the official final score (with tone information) was 14.81%. Additionally, we analyze the contribution of LID (Language Identification) for accent recognition across different recording and media sources.

Anthology ID:: 2025.rocling-main.63
Volume:: Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:: November
Year:: 2025
Address:: National Taiwan University, Taipei City, Taiwan
Editors:: Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:: ROCLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 518–522
Language:
URL:: https://aclanthology.org/2025.rocling-main.63/
DOI:
Bibkey:
Cite (ACL):: Yi-Chin Huang, Yu-Heng Chen, Jian-Hua Wang, Hsiu-Chi Wu, Chih-Chung Kuo, Chao-Shih Huang, and Yuan-Fu Liao. 2025. The NPTU ASR System for FSR2025 Hakka Character/Pinyin Recognition: Whisper with mBART Post-Editing and RNNLM Rescoring. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 518–522, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):: The NPTU ASR System for FSR2025 Hakka Character/Pinyin Recognition: Whisper with mBART Post-Editing and RNNLM Rescoring (Huang et al., ROCLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.rocling-main.63.pdf

PDF Cite Search Fix data