Automatic Spelling Correction for Resource-Scarce Languages using Deep Learning

Pravallika Etoori, Manoj Chinnakotla, Radhika Mamidi


Abstract
Spelling correction is a well-known task in Natural Language Processing (NLP). Automatic spelling correction is important for many NLP applications like web search engines, text summarization, sentiment analysis etc. Most approaches use parallel data of noisy and correct word mappings from different sources as training data for automatic spelling correction. Indic languages are resource-scarce and do not have such parallel data due to low volume of queries and non-existence of such prior implementations. In this paper, we show how to build an automatic spelling corrector for resource-scarce languages. We propose a sequence-to-sequence deep learning model which trains end-to-end. We perform experiments on synthetic datasets created for Indic languages, Hindi and Telugu, by incorporating the spelling mistakes committed at character level. A comparative evaluation shows that our model is competitive with the existing spell checking and correction techniques for Indic languages.
Anthology ID:
P18-3021
Volume:
Proceedings of ACL 2018, Student Research Workshop
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Vered Shwartz, Jeniya Tabassum, Rob Voigt, Wanxiang Che, Marie-Catherine de Marneffe, Malvina Nissim
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
146–152
Language:
URL:
https://aclanthology.org/P18-3021/
DOI:
10.18653/v1/P18-3021
Bibkey:
Cite (ACL):
Pravallika Etoori, Manoj Chinnakotla, and Radhika Mamidi. 2018. Automatic Spelling Correction for Resource-Scarce Languages using Deep Learning. In Proceedings of ACL 2018, Student Research Workshop, pages 146–152, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Automatic Spelling Correction for Resource-Scarce Languages using Deep Learning (Etoori et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-3021.pdf