Flora Ramírez Bustamante
2006
A Spell Checker for a World Language: The New Microsoft’s Spanish Spell Checker
Flora Ramírez Bustamante
|
Alfredo Arnaiz
|
Mar Ginés
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper reports work carried out to develop a speller for Spanish at Microsoft Corporation, discusses the technique for isolated-word error correction used by the speller, provides general descriptions of the error data collection and error typology, and surveys a variety of linguistic considerations relevant when dealing with a world language spread over several countries and exposed to different language influences. We show that even though it has been claimed that the state of the art for practical applications based on isolated word error correction does not offer always a sensible set of ranked candidates for the misspelling, the introduction of a finer-grained categorization of errors and the use of their relative frequency has had a positive impact in the speller application developed for Spanish (the corresponding evaluation data is presented).
Spelling Error Patterns in Spanish for Word Processing Applications
Flora Ramírez Bustamante
|
Enrique López Díaz
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper reports findings from the elaboration of a typology of spelling errors for Spanish. It also discusses previous generalizations about spelling error patterns found in other studies and offers new insights on them. The typology is based on the analysis of around 76K misspellings found in real-life texts produced by humans. The main goal of the elaboration of the typology was to help in the im-plementation of a spell checker that detects context-independent misspellings in general unrestricted texts with the most common con-fusion pairs (i.e. error/correction pairs) to improve the set of ranked correction candidates for misspellings. We found that spelling er-rors are language dependent and are closely related to the orthographic rules of each language. The statistical data we provide on spell-ing error patterns in Spanish and their comparison with other data in other related works are the novel contribution of this paper. In this line, this paper shows that some of the general statements found in the literature about spelling error patterns apply mainly to English and cannot be extrapolated to other languages.
1996
GramCheck: A Grammar and Style Checker
Flora Ramírez Bustamante
|
Fernando Sánchez León
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics