Taiwanese Hakka Across Taiwan Corpus and Formosa Speech Recognition Challenge 2025 – Dapu & Zhao’an Accents

Yuan-Fu Liao, Chih-Chung Kuo, Chao-Shih Huang, Yu-Siang Lan, Han-Chun Lai, Wen-Han Hsu


Abstract
To revive the endangered Hakka language in Taiwan, the first large-scale Hakka speech corpus covering all aspects of Taiwanese Hakka across Taiwan (HAT) was created. This paper introduces the second part of the HAT corpus: the Dapu and Zhao’an accents. Furthermore, to promote this newly constructed corpus and evaluate the performance of the most advanced Hakka ASR system, the 2025 Formosa Speech Recognition Challenge, FSR-2025–Hakka ASR II, was held. Sixteen teams participated on two tracks: speech-to-Hakka-Hanzi and speech-to Hakka-Pinyin. The best results were: Hanzi character error rate (CER) 7.50%; Pinyin syllable error rate (SER) 14.81%.
Anthology ID:
2025.rocling-main.48
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
427–434
Language:
URL:
https://aclanthology.org/2025.rocling-main.48/
DOI:
Bibkey:
Cite (ACL):
Yuan-Fu Liao, Chih-Chung Kuo, Chao-Shih Huang, Yu-Siang Lan, Han-Chun Lai, and Wen-Han Hsu. 2025. Taiwanese Hakka Across Taiwan Corpus and Formosa Speech Recognition Challenge 2025 – Dapu & Zhao’an Accents. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 427–434, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Taiwanese Hakka Across Taiwan Corpus and Formosa Speech Recognition Challenge 2025 – Dapu & Zhao’an Accents (Liao et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.48.pdf