String Editing Based Chinese Grammatical Error Diagnosis

Haihua Xie, Xiaoqing Lyu, Xuefei Chen


Abstract
Chinese Grammatical Error Diagnosis (CGED) suffers the problems of numerous types of grammatical errors and insufficiency of training data. In this paper, we propose a string editing based CGED model that requires less training data by using a unified workflow to handle various types of grammatical errors. Two measures are proposed in our model to enhance the performance of CGED. First, the detection and correction of grammatical errors are divided into different stages. In the stage of error detection, the model only outputs the types of grammatical errors so that the tag vocabulary size is significantly reduced compared with other string editing based models. Secondly, the correction of some grammatical errors is converted to the task of masked character inference, which has plenty of training data and mature solutions. Experiments on datasets of NLPTEA-CGED demonstrate that our model outperforms other CGED models in many aspects.
Anthology ID:
2022.coling-1.474
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5335–5344
Language:
URL:
https://aclanthology.org/2022.coling-1.474
DOI:
Bibkey:
Cite (ACL):
Haihua Xie, Xiaoqing Lyu, and Xuefei Chen. 2022. String Editing Based Chinese Grammatical Error Diagnosis. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5335–5344, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
String Editing Based Chinese Grammatical Error Diagnosis (Xie et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.474.pdf