Yingjie Yan


2020

pdf bib
Chinese Grammatical Error Diagnosis Based on RoBERTa-BiLSTM-CRF Model
Yingjie Han | Yingjie Yan | Yangchao Han | Rui Chao | Hongying Zan
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications

Chinese Grammatical Error Diagnosis (CGED) is a natural language processing task for the NLPTEA6 workshop. The goal of this task is to automatically diagnose grammatical errors in Chinese sentences written by L2 learners. This paper proposes a RoBERTa-BiLSTM-CRF model to detect grammatical errors in sentences. Firstly, RoBERTa model is used to obtain word vectors. Secondly, word vectors are input into BiLSTM layer to learn context features. Last, CRF layer without hand-craft features work for processing the output by BiLSTM. The optimal global sequences are obtained according to state transition matrix of CRF and adjacent labels of training data. In experiments, the result of RoBERTa-CRF model and ERNIE-BiLSTM-CRF model are compared, and the impacts of parameters of the models and the testing datasets are analyzed. In terms of evaluation results, our recall score of RoBERTa-BiLSTM-CRF ranks fourth at the detection level.

pdf bib
Chinese Grammatical Errors Diagnosis System Based on BERT at NLPTEA-2020 CGED Shared Task
Hongying Zan | Yangchao Han | Haotian Huang | Yingjie Yan | Yuke Wang | Yingjie Han
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications

In the process of learning Chinese, second language learners may have various grammatical errors due to the negative transfer of native language. This paper describes our submission to the NLPTEA 2020 shared task on CGED. We present a hybrid system that utilizes both detection and correction stages. The detection stage is a sequential labelling model based on BiLSTM-CRF and BERT contextual word representation. The correction stage is a hybrid model based on the n-gram and Seq2Seq. Without adding additional features and external data, the BERT contextual word representation can effectively improve the performance metrics of Chinese grammatical error detection and correction.