Mohammad Hoseyn Sheykholeslam


2012

pdf bib
A Framework for Spelling Correction in Persian Language Using Noisy Channel Model
Mohammad Hoseyn Sheykholeslam | Behrouz Minaei-Bidgoli | Hossein Juzi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

There are several methods offered for spelling correction in Farsi (Persian) Language. Unfortunately no powerful framework has been implemented because of lack of a large training set in Farsi as an accurate model. A training set consisting of erroneous and related correction string pairs have been obtained from a large number of instances of the books each of which were typed two times in Computer Research Center of Islamic Sciences. We trained our error model using this huge set. In testing part after finding erroneous words in sample text, our program proposes some candidates for related correction. The paper focuses on describing the method of ranking related corrections. This method is customized version of Noisy Channel Spelling Correction for Farsi. This ranking method attempts to find intended correction c from a typo t, that maximizes P(c) P(t | c). In this paper different methods are described and analyzed to obtain a wide overview of the field. Our evaluation results show that Noisy Channel Model using our corpus and training set in this framework works more accurately and improves efficiently in comparison with other methods.