Yuan-Fu Liao


2025

To revive the endangered Hakka language in Taiwan, the first large-scale Hakka speech corpus covering all aspects of Taiwanese Hakka across Taiwan (HAT) was created. This paper introduces the second part of the HAT corpus: the Dapu and Zhao’an accents. Furthermore, to promote this newly constructed corpus and evaluate the performance of the most advanced Hakka ASR system, the 2025 Formosa Speech Recognition Challenge, FSR-2025–Hakka ASR II, was held. Sixteen teams participated on two tracks: speech-to-Hakka-Hanzi and speech-to Hakka-Pinyin. The best results were: Hanzi character error rate (CER) 7.50%; Pinyin syllable error rate (SER) 14.81%.
This paper presents our system for the FSR-2025 Hakka Automatic Speech Recognition (ASR) Challenge, which consists of two sub-tasks: (i) Hakka Characters and (ii) Hakka Pinyin. We propose a unified architecture built upon Whisper [1], a large weakly supervised ASR model, as the acoustic backbone, with optional LoRA (Low-Rank Adaptation [2]) for parameter-efficient fine-tuning. Data augmentation techniques include the MUSAN [3] corpus (music/speech/noise) and tempo/speed perturbation [4]. For the character task, mBART-50 [5,6], a multilingual sequence-to-sequence model, is applied for text correction, while both tasks employ an RNNLM [7] for N-best rescoring. Under the final evaluation setting of the character task, mBART-driven 10-best text correction combined with RNNLM rescoring achieved a CER (Character Error Rate) of 6.26%, whereas the official leaderboard reported 22.5%. For the Pinyin task, the Medium model proved more suitable than the Large model given the dataset size and accent distribution. With 10-best RNNLM rescoring, it achieved a SER (Syllable Error Rate) of 4.65% on our internal warm-up test set, and the official final score (with tone information) was 14.81%. Additionally, we analyze the contribution of LID (Language Identification) for accent recognition across different recording and media sources.

2023

2022

2020

2019

2017

In this paper, a deep phrase embedding approach using bi-directional long short-term memory (Bi-LSTM) is proposed to predict the valence-arousal ratings of Chinese words and phrases. It adopts a Chinese word segmentation frontend, a local order-aware word, a global phrase embedding representations and a deep regression neural network (DRNN) model. The performance of the proposed method was benchmarked by the IJCNLP 2017 shared task 2. According the official evaluation results, our best system achieved mean rank 6.5 among all 24 submissions.

2016

This paper discusses how to adapt two new word embedding features to build a more efficient Chinese Grammatical Error Diagnosis (CGED) systems to assist Chinese foreign learners (CFLs) in improving their written essays. The major idea is to apply word order sensitive Word2Vec approaches including (1) structured skip-gram and (2) continuous window (CWindow) models, because they are more suitable for solving syntax-based problems. The proposed new features were evaluated on the Test of Chinese as a Foreign Language (TOCFL) learner database provided by NLP-TEA-3&CGED shared task. Experimental results showed that the new features did work better than the traditional word order insensitive Word2Vec approaches. Moreover, according to the official evaluation results, our system achieved the lowest (0.1362) false positive (FA) and the highest precision rates in all three measurements.

2015

2014

2013

2012

2011

2009

2006

2005

1997