Chinese Automatic Readability Assessment Using Adaptive Pre-training and Linguistic Feature Fusion

Xusheng Yang; Jincai Yang; Xiao Li

Chinese Automatic Readability Assessment Using Adaptive Pre-training and Linguistic Feature Fusion

Abstract

Chinese Automatic Readability Assessment (ARA) aims to classify the reading difficulty of Chinese texts. To address the issues of insufficient high-quality training data and underutilization of linguistic features in existing methods, we propose a method that combines adaptive pre-training with feature fusion based on an interactive attention mechanism. First, we enhance the model’s ability to capture different text difficulties through domain- and task-specific adaptive pre-training. Then, we propose an Adaptive Task-guided Corpus Filtering (ATCF) method, utilizing embeddings generated by the pre-trained model and applying nearest-neighbor search along with a sample balancing mechanism to ensure comprehensive learning across various difficulty levels. Finally, we propose an Interactive Attention-Driven Feature Fusion method that integrates linguistic and deep features, providing rich difficulty information to the model. Experiments on Chinese textbook dataset demonstrate that our method achieves state-of-the-art (SOTA) performance. Transfer learning experiments further indicate that our approach generalizes well to extracurricular reading and Chinese as a Foreign Language (CFL) ARA tasks.

Anthology ID:: 2025.coling-main.605
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9013–9024
Language:
URL:: https://aclanthology.org/2025.coling-main.605/
DOI:
Bibkey:
Cite (ACL):: Xusheng Yang, Jincai Yang, and Xiao Li. 2025. Chinese Automatic Readability Assessment Using Adaptive Pre-training and Linguistic Feature Fusion. In Proceedings of the 31st International Conference on Computational Linguistics, pages 9013–9024, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Chinese Automatic Readability Assessment Using Adaptive Pre-training and Linguistic Feature Fusion (Yang et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.605.pdf

PDF Cite Search Fix data