Jincai Yang


2025

pdf bib
Chinese Automatic Readability Assessment Using Adaptive Pre-training and Linguistic Feature Fusion
Xusheng Yang | Jincai Yang | Xiao Li
Proceedings of the 31st International Conference on Computational Linguistics

Chinese Automatic Readability Assessment (ARA) aims to classify the reading difficulty of Chinese texts. To address the issues of insufficient high-quality training data and underutilization of linguistic features in existing methods, we propose a method that combines adaptive pre-training with feature fusion based on an interactive attention mechanism. First, we enhance the model’s ability to capture different text difficulties through domain- and task-specific adaptive pre-training. Then, we propose an Adaptive Task-guided Corpus Filtering (ATCF) method, utilizing embeddings generated by the pre-trained model and applying nearest-neighbor search along with a sample balancing mechanism to ensure comprehensive learning across various difficulty levels. Finally, we propose an Interactive Attention-Driven Feature Fusion method that integrates linguistic and deep features, providing rich difficulty information to the model. Experiments on Chinese textbook dataset demonstrate that our method achieves state-of-the-art (SOTA) performance. Transfer learning experiments further indicate that our approach generalizes well to extracurricular reading and Chinese as a Foreign Language (CFL) ARA tasks.