Chinese Grammatical Error Diagnosis with Long Short-Term Memory Networks

Bo Zheng, Wanxiang Che, Jiang Guo, Ting Liu


Abstract
Grammatical error diagnosis is an important task in natural language processing. This paper introduces our Chinese Grammatical Error Diagnosis (CGED) system in the NLP-TEA-3 shared task for CGED. The CGED system can diagnose four types of grammatical errors which are redundant words (R), missing words (M), bad word selection (S) and disordered words (W). We treat the CGED task as a sequence labeling task and describe three models, including a CRF-based model, an LSTM-based model and an ensemble model using stacking. We also show in details how we build and train the models. Evaluation includes three levels, which are detection level, identification level and position level. On the CGED-HSK dataset of NLP-TEA-3 shared task, our system presents the best F1-scores in all the three levels and also the best recall in the last two levels.
Anthology ID:
W16-4907
Volume:
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Hsin-Hsi Chen, Yuen-Hsien Tseng, Vincent Ng, Xiaofei Lu
Venue:
NLP-TEA
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
49–56
Language:
URL:
https://aclanthology.org/W16-4907/
DOI:
Bibkey:
Cite (ACL):
Bo Zheng, Wanxiang Che, Jiang Guo, and Ting Liu. 2016. Chinese Grammatical Error Diagnosis with Long Short-Term Memory Networks. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016), pages 49–56, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Chinese Grammatical Error Diagnosis with Long Short-Term Memory Networks (Zheng et al., NLP-TEA 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4907.pdf