Minimally-Augmented Grammatical Error Correction

Roman Grundkiewicz, Marcin Junczys-Dowmunt


Abstract
There has been an increased interest in low-resource approaches to automatic grammatical error correction. We introduce Minimally-Augmented Grammatical Error Correction (MAGEC) that does not require any error-labelled data. Our unsupervised approach is based on a simple but effective synthetic error generation method based on confusion sets from inverted spell-checkers. In low-resource settings, we outperform the current state-of-the-art results for German and Russian GEC tasks by a large margin without using any real error-annotated training data. When combined with labelled data, our method can serve as an efficient pre-training technique
Anthology ID:
D19-5546
Volume:
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
357–363
Language:
URL:
https://aclanthology.org/D19-5546
DOI:
10.18653/v1/D19-5546
Bibkey:
Cite (ACL):
Roman Grundkiewicz and Marcin Junczys-Dowmunt. 2019. Minimally-Augmented Grammatical Error Correction. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 357–363, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Minimally-Augmented Grammatical Error Correction (Grundkiewicz & Junczys-Dowmunt, WNUT 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-5546.pdf