%0 Conference Proceedings
%T A Benchmark Corpus of English Misspellings and a Minimally-supervised Model for Spelling Correction
%A Flor, Michael
%A Fried, Michael
%A Rozovskaya, Alla
%Y Yannakoudakis, Helen
%Y Kochmar, Ekaterina
%Y Leacock, Claudia
%Y Madnani, Nitin
%Y Pilán, Ildikó
%Y Zesch, Torsten
%S Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
%D 2019
%8 August
%I Association for Computational Linguistics
%C Florence, Italy
%F flor-etal-2019-benchmark
%X Spelling correction has attracted a lot of attention in the NLP community. However, models have been usually evaluated on artificiallycreated or proprietary corpora. A publiclyavailable corpus of authentic misspellings, annotated in context, is still lacking. To address this, we present and release an annotated data set of 6,121 spelling errors in context, based on a corpus of essays written by English language learners. We also develop a minimallysupervised context-aware approach to spelling correction. It achieves strong results on our data: 88.12% accuracy. This approach can also train with a minimal amount of annotated data (performance reduced by less than 1%). Furthermore, this approach allows easy portability to new domains. We evaluate our model on data from a medical domain and demonstrate that it rivals the performance of a model trained and tuned on in-domain data.
%R 10.18653/v1/W19-4407
%U https://aclanthology.org/W19-4407
%U https://doi.org/10.18653/v1/W19-4407
%P 76-86