Carlos H. C. Ribeiro


2026

Automatic assessment of reading in children who are learning to read is challenging due to the lack of data and the high variability of children’s speech. This work investigates the improvement of Automatic Speech Recognition (ASR) models for the analysis of reading decoding of isolated words in Brazilian Portuguese. We propose a methodology based on fine-tuning Wav2Vec2.0 models, with a paradigm transformation from orthographic to phonemic transcription. Using a novel corpus of 5,400 audio word samples from children in the 2nd and 3rd grades of Elementary School, we compare pre-trained models in Portuguese and multilingual. Results reveal that the phonemic approach, combined with fine-tuning strategies, data augmentation, and adapted tokenization, significantly reduces the Phoneme Error Rate (PER). This overcomes the limitations of commercial tools and validates the use of ASR for the detailed diagnosis of decoding errors and phonological acquisition.