Nissan D. Macale


2024

Bangsamoro languages are among the under-resourced languages in the Mindanao region in the Philippines. Moreover, there is no currently publicly available data for children’s speech on most of these languages. BK3AT children’s speech corpus is a corpus designed for creating speech technologies that could help facilitators and teachers in K-3 education. The corpus consists of 122 hours of children speech data across 10 languages: Bahasa Sug, Chavacano, English, Filipino, Iranun, Maguindanaon, Meranaw, Sinama, Teduray, and Yakan. Preliminary experiments using Wav2Vec-XLSR architecture have been done in fine-tuning the Tagalog and L2 English corpus subsets to develop automatic speech recognition backend for literacy assessment. Results from the experiments show low word error rates (WERs) for small-vocabulary and targeted domains.