Qiu Xinying


2024

pdf bib
Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese
Zhang Jingshen | Chen Xinglu | Qiu Xinying | Wang Zhimin | Feng Wenhe
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“Chinese sentence simplification faces challenges due to the lack of large-scale labeledparallel corpora and the prevalence of idioms. To address these challenges, we pro-pose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel frameworkthat combines data augmentation techniques. RISS introduces two key components: (1)Readability-guided Paraphrase Selection (RPS), a method for mining high-quality sen-tence pairs, and (2) Idiom-aware Simplification (IAS), a model that enhances the compre-hension and simplification of idiomatic expressions. By integrating RPS and IAS usingmulti-stage and multi-task learning strategies, RISS outperforms previous state-of-the-artmethods on two Chinese sentence simplification datasets. Furthermore, RISS achievesadditional improvements when fine-tuned on a small labeled dataset. Our approachdemonstrates the potential for more effective and accessible Chinese text simplification.”

pdf bib
Multi-Error Modeling and Fluency-Targeted Pre-training for Chinese Essay Evaluation
Zhang Jingshen | Yang Xiangyu | Su Xinkai | Chen Xinglu | Huang Tianyou | Qiu Xinying
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“This system report presents our approaches and results for the Chinese Essay Fluency Evaluation (CEFE) task at CCL-2024. For Track 1, we optimized predictions for challenging fine-grained error types using binary classification models and trained coarse-grained models on the Chinese Learner 4W corpus. In Track 2, we enhanced performance by constructing a pseudo-dataset with multiple error types per sentence. For Track 3, where we achieved first place, we generated fluency-rated pseudo-data via back-translation for pretraining and used an NSP-based strategy with Symmetric Cross Entropy loss to capture context and mitigate long dependencies. Our methods effectively address key challenges in Chinese Essay Fluency Evaluation.”