Yu-Siang Lan
2025
Taiwanese Hakka Across Taiwan Corpus and Formosa Speech Recognition Challenge 2025 – Dapu & Zhao’an Accents
Yuan-Fu Liao
|
Chih-Chung Kuo
|
Chao-Shih Huang
|
Yu-Siang Lan
|
Han-Chun Lai
|
Wen-Han Hsu
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
To revive the endangered Hakka language in Taiwan, the first large-scale Hakka speech corpus covering all aspects of Taiwanese Hakka across Taiwan (HAT) was created. This paper introduces the second part of the HAT corpus: the Dapu and Zhao’an accents. Furthermore, to promote this newly constructed corpus and evaluate the performance of the most advanced Hakka ASR system, the 2025 Formosa Speech Recognition Challenge, FSR-2025–Hakka ASR II, was held. Sixteen teams participated on two tracks: speech-to-Hakka-Hanzi and speech-to Hakka-Pinyin. The best results were: Hanzi character error rate (CER) 7.50%; Pinyin syllable error rate (SER) 14.81%.