BERT Enhanced Neural Machine Translation and Sequence Tagging Model for Chinese Grammatical Error Diagnosis

Deng Liang, Chen Zheng, Lei Guo, Xin Cui, Xiuzhang Xiong, Hengqiao Rong, Jinpeng Dong


Abstract
This paper presents the UNIPUS-Flaubert team’s hybrid system for the NLPTEA 2020 shared task of Chinese Grammatical Error Diagnosis (CGED). As a challenging NLP task, CGED has attracted increasing attention recently and has not yet fully benefited from the powerful pre-trained BERT-based models. We explore this by experimenting with three types of models. The position-tagging models and correction-tagging models are sequence tagging models fine-tuned on pre-trained BERT-based models, where the former focuses on detecting, positioning and classifying errors, and the latter aims at correcting errors. We also utilize rich representations from BERT-based models by transferring the BERT-fused models to the correction task, and further improve the performance by pre-training on a vast size of unsupervised synthetic data. To the best of our knowledge, we are the first to introduce and transfer the BERT-fused NMT model and sequence tagging model into the Chinese Grammatical Error Correction field. Our work achieved the second highest F1 score at the detecting errors, the best F1 score at correction top1 subtask and the second highest F1 score at correction top3 subtask.
Anthology ID:
2020.nlptea-1.8
Volume:
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications
Month:
December
Year:
2020
Address:
Suzhou, China
Venues:
AACL | NLP-TEA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
57–66
Language:
URL:
https://aclanthology.org/2020.nlptea-1.8
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.nlptea-1.8.pdf