A Comparative Study of Synthetic Data Generation Methods for Grammatical Error Correction

Max White, Alla Rozovskaya


Abstract
Grammatical Error Correction (GEC) is concerned with correcting grammatical errors in written text. Current GEC systems, namely those leveraging statistical and neural machine translation, require large quantities of annotated training data, which can be expensive or impractical to obtain. This research compares techniques for generating synthetic data utilized by the two highest scoring submissions to the restricted and low-resource tracks in the BEA-2019 Shared Task on Grammatical Error Correction.
Anthology ID:
2020.bea-1.21
Volume:
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
July
Year:
2020
Address:
Seattle, WA, USA → Online
Editors:
Jill Burstein, Ekaterina Kochmar, Claudia Leacock, Nitin Madnani, Ildikó Pilán, Helen Yannakoudakis, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
198–208
Language:
URL:
https://aclanthology.org/2020.bea-1.21
DOI:
10.18653/v1/2020.bea-1.21
Bibkey:
Cite (ACL):
Max White and Alla Rozovskaya. 2020. A Comparative Study of Synthetic Data Generation Methods for Grammatical Error Correction. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 198–208, Seattle, WA, USA → Online. Association for Computational Linguistics.
Cite (Informal):
A Comparative Study of Synthetic Data Generation Methods for Grammatical Error Correction (White & Rozovskaya, BEA 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.bea-1.21.pdf