A Spell Checker for a World Language: The New Microsoft’s Spanish Spell Checker

Flora Ramírez Bustamante, Alfredo Arnaiz, Mar Ginés


Abstract
This paper reports work carried out to develop a speller for Spanish at Microsoft Corporation, discusses the technique for isolated-word error correction used by the speller, provides general descriptions of the error data collection and error typology, and surveys a variety of linguistic considerations relevant when dealing with a world language spread over several countries and exposed to different language influences. We show that even though it has been claimed that the state of the art for practical applications based on isolated word error correction does not offer always a sensible set of ranked candidates for the misspelling, the introduction of a finer-grained categorization of errors and the use of their relative frequency has had a positive impact in the speller application developed for Spanish (the corresponding evaluation data is presented).
Anthology ID:
L06-1007
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/12_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Flora Ramírez Bustamante, Alfredo Arnaiz, and Mar Ginés. 2006. A Spell Checker for a World Language: The New Microsoft’s Spanish Spell Checker. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
A Spell Checker for a World Language: The New Microsoft’s Spanish Spell Checker (Bustamante et al., LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/12_pdf.pdf