Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation

Tianchi Yue, Shulin Liu, Huihui Cai, Tao Yang, Shengkang Song, TingHao Yu


Abstract
Chinese Grammatical Error Detection(CGED) aims at detecting grammatical errors in Chinese texts. One of the main challenges for CGED is the lack of annotated data. To alleviate this problem, previous studies proposed various methods to automatically generate more training samples, which can be roughly categorized into rule-based methods and model-based methods. The rule-based methods construct erroneous sentences by directly introducing noises into original sentences. However, the introduced noises are usually context-independent, which are quite different from those made by humans. The model-based methods utilize generative models to imitate human errors. The generative model may bring too many changes to the original sentences and generate semantically ambiguous sentences, so it is difficult to detect grammatical errors in these generated sentences. In addition, generated sentences may be error-free and thus become noisy data. To handle these problems, we propose CNEG, a novel Conditional Non-Autoregressive Error Generation model for generating Chinese grammatical errors. Specifically, in order to generate a context-dependent error, we first mask a span in a correct text, then predict an erroneous span conditioned on both the masked text and the correct span. Furthermore, we filter out error-free spans by measuring their perplexities in the original sentences. Experimental results show that our proposed method achieves better performance than all compared data augmentation methods on the CGED-2018 and CGED-2020 benchmarks.
Anthology ID:
2022.findings-acl.233
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2966–2975
Language:
URL:
https://aclanthology.org/2022.findings-acl.233
DOI:
10.18653/v1/2022.findings-acl.233
Bibkey:
Cite (ACL):
Tianchi Yue, Shulin Liu, Huihui Cai, Tao Yang, Shengkang Song, and TingHao Yu. 2022. Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2966–2975, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation (Yue et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-acl.233.pdf