Yen-Hsiang Chien
2025
Training a Chinese Listenability Model Using Word2Vec to Predict the Difficulty of Spoken Texts
Yen-Hsiang Chien
|
Hou-Chiang Tseng
|
Kuan-Yu Chen
|
Yao-Ting Sung
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
With the proliferation of digital learning, an increasing number of learners are engaging with audio-visual materials. For preschool and lower elementary students, whose literacy skills are still limited, knowledge acquisition relies more heavily on spoken and visual content. Traditional readability models were primarily developed for written texts, and their applicability to spoken materials remains uncertain. To address this issue, this study investigates the impact of different word segmentation tools and language models on the performance of automatic grade classification models for Chinese spoken materials. Support Vector Machines were employed for grade prediction, aiming to automatically determine the appropriate grade level of learning resources and assist learners in selecting suitable materials. The results show that language models with higher-dimensional word embeddings achieved better classification performance, with an accuracy of up to 61% and an adjacent accuracy of 76%. These findings may contribute to future digital learning platforms or educational resource recommendation systems by automatically providing students with appropriate listening materials to enhance learning outcomes.