TransGEC: Improving Grammatical Error Correction with Translationese

Tao Fang, Xuebo Liu, Derek F. Wong, Runzhe Zhan, Liang Ding, Lidia S. Chao, Dacheng Tao, Min Zhang


Abstract
Data augmentation is an effective way to improve model performance of grammatical error correction (GEC). This paper identifies a critical side-effect of GEC data augmentation, which is due to the style discrepancy between the data used in GEC tasks (i.e., texts produced by non-native speakers) and data augmentation (i.e., native texts). To alleviate this issue, we propose to use an alternative data source, translationese (i.e., human-translated texts), as input for GEC data augmentation, which 1) is easier to obtain and usually has better quality than non-native texts, and 2) has a more similar style to non-native texts. Experimental results on the CoNLL14 and BEA19 English, NLPCC18 Chinese, Falko-MERLIN German, and RULEC-GEC Russian GEC benchmarks show that our approach consistently improves correction accuracy over strong baselines. Further analyses reveal that our approach is helpful for overcoming mainstream correction difficulties such as the corrections of frequent words, missing words, and substitution errors. Data, code, models and scripts are freely available at https://github.com/NLP2CT/TransGEC.
Anthology ID:
2023.findings-acl.223
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3614–3633
Language:
URL:
https://aclanthology.org/2023.findings-acl.223
DOI:
10.18653/v1/2023.findings-acl.223
Bibkey:
Cite (ACL):
Tao Fang, Xuebo Liu, Derek F. Wong, Runzhe Zhan, Liang Ding, Lidia S. Chao, Dacheng Tao, and Min Zhang. 2023. TransGEC: Improving Grammatical Error Correction with Translationese. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3614–3633, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
TransGEC: Improving Grammatical Error Correction with Translationese (Fang et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.223.pdf