Chinese Automatic Readability Assessment Using Adaptive Pre-training and Linguistic Feature Fusion

Xusheng Yang, Jincai Yang, Xiao Li


Abstract
Chinese Automatic Readability Assessment (ARA) aims to classify the reading difficulty of Chinese texts. To address the issues of insufficient high-quality training data and underutilization of linguistic features in existing methods, we propose a method that combines adaptive pre-training with feature fusion based on an interactive attention mechanism. First, we enhance the model’s ability to capture different text difficulties through domain- and task-specific adaptive pre-training. Then, we propose an Adaptive Task-guided Corpus Filtering (ATCF) method, utilizing embeddings generated by the pre-trained model and applying nearest-neighbor search along with a sample balancing mechanism to ensure comprehensive learning across various difficulty levels. Finally, we propose an Interactive Attention-Driven Feature Fusion method that integrates linguistic and deep features, providing rich difficulty information to the model. Experiments on Chinese textbook dataset demonstrate that our method achieves state-of-the-art (SOTA) performance. Transfer learning experiments further indicate that our approach generalizes well to extracurricular reading and Chinese as a Foreign Language (CFL) ARA tasks.
Anthology ID:
2025.coling-main.605
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9013–9024
Language:
URL:
https://aclanthology.org/2025.coling-main.605/
DOI:
Bibkey:
Cite (ACL):
Xusheng Yang, Jincai Yang, and Xiao Li. 2025. Chinese Automatic Readability Assessment Using Adaptive Pre-training and Linguistic Feature Fusion. In Proceedings of the 31st International Conference on Computational Linguistics, pages 9013–9024, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Chinese Automatic Readability Assessment Using Adaptive Pre-training and Linguistic Feature Fusion (Yang et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.605.pdf