Improving Grammatical Error Correction via Contextual Data Augmentation

Yixuan Wang, Baoxin Wang, Yijun Liu, Qingfu Zhu, Dayong Wu, Wanxiang Che


Abstract
Nowadays, data augmentation through synthetic data has been widely used in the field of Grammatical Error Correction (GEC) to alleviate the problem of data scarcity. However, these synthetic data are mainly used in the pre-training phase rather than the data-limited fine tuning phase due to inconsistent error distribution and noisy labels. In this paper, we propose a synthetic data construction method based on contextual augmentation, which can ensure an efficient augmentation of the original data with a more consistent error distribution. Specifically, we combine rule-based substitution with model-based generation, using the generation model to generate a richer context for the extracted error patterns. Besides, we also propose a relabeling-based data cleaning method to mitigate the effects of noisy labels in synthetic data. Experiments on CoNLL14 and BEA19-Test show that our proposed augmentation method consistently and substantially outperforms strong baselines and achieves the state-of-the-art level with only a few synthetic data.
Anthology ID:
2024.findings-acl.647
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10898–10910
Language:
URL:
https://aclanthology.org/2024.findings-acl.647
DOI:
Bibkey:
Cite (ACL):
Yixuan Wang, Baoxin Wang, Yijun Liu, Qingfu Zhu, Dayong Wu, and Wanxiang Che. 2024. Improving Grammatical Error Correction via Contextual Data Augmentation. In Findings of the Association for Computational Linguistics ACL 2024, pages 10898–10910, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Improving Grammatical Error Correction via Contextual Data Augmentation (Wang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.647.pdf