Towards Explainable Chinese Native Learner Essay Fluency Assessment: Dataset, Tasks, and Method

Xinshu Shen, Hongyi Wu, Yadong Zhang, Man Lan, Xiaopeng Bai, Shaoguang Mao, Yuanbin Wu, Xinlin Zhuang, Li Cai


Abstract
Grammatical Error Correction (GEC) is a crucial technique in Automated Essay Scoring (AES) for evaluating the fluency of essays. However, in Chinese, existing GEC datasets often fail to consider the importance of specific grammatical error types within compositional scenarios, lack research on data collected from native Chinese speakers, and largely overlook cross-sentence grammatical errors. Furthermore, the measurement of the overall fluency of an essay is often overlooked. To address these issues, we present CEFA (Chinese Essay Fluency Assessment), an extensive corpus that is derived from essays authored by native Chinese-speaking primary and secondary students and encapsulates essay fluency scores along with both coarse and fine-grained grammatical error types and corrections. Experiments employing various benchmark models on CEFA substantiate the challenge of our dataset. Our findings further highlight the significance of fine-grained annotations in fluency assessment and the mutually beneficial relationship between error types and corrections
Anthology ID:
2024.findings-emnlp.910
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15515–15528
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.910
DOI:
Bibkey:
Cite (ACL):
Xinshu Shen, Hongyi Wu, Yadong Zhang, Man Lan, Xiaopeng Bai, Shaoguang Mao, Yuanbin Wu, Xinlin Zhuang, and Li Cai. 2024. Towards Explainable Chinese Native Learner Essay Fluency Assessment: Dataset, Tasks, and Method. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 15515–15528, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Towards Explainable Chinese Native Learner Essay Fluency Assessment: Dataset, Tasks, and Method (Shen et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.910.pdf