Training a Chinese Listenability Model Using Word2Vec to Predict the Difficulty of Spoken Texts

Yen-Hsiang Chien, Hou-Chiang Tseng, Kuan-Yu Chen, Yao-Ting Sung


Abstract
With the proliferation of digital learning, an increasing number of learners are engaging with audio-visual materials. For preschool and lower elementary students, whose literacy skills are still limited, knowledge acquisition relies more heavily on spoken and visual content. Traditional readability models were primarily developed for written texts, and their applicability to spoken materials remains uncertain. To address this issue, this study investigates the impact of different word segmentation tools and language models on the performance of automatic grade classification models for Chinese spoken materials. Support Vector Machines were employed for grade prediction, aiming to automatically determine the appropriate grade level of learning resources and assist learners in selecting suitable materials. The results show that language models with higher-dimensional word embeddings achieved better classification performance, with an accuracy of up to 61% and an adjacent accuracy of 76%. These findings may contribute to future digital learning platforms or educational resource recommendation systems by automatically providing students with appropriate listening materials to enhance learning outcomes.
Anthology ID:
2025.rocling-main.1
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–10
Language:
URL:
https://aclanthology.org/2025.rocling-main.1/
DOI:
Bibkey:
Cite (ACL):
Yen-Hsiang Chien, Hou-Chiang Tseng, Kuan-Yu Chen, and Yao-Ting Sung. 2025. Training a Chinese Listenability Model Using Word2Vec to Predict the Difficulty of Spoken Texts. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 1–10, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Training a Chinese Listenability Model Using Word2Vec to Predict the Difficulty of Spoken Texts (Chien et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.1.pdf