Spelling Correction for Estonian Learner Language

Kais Allkivi-Metsoja, Jaagup Kippar


Abstract
Second and foreign language (L2) learners often make specific spelling errors compared to native speakers. Language-independent spell-checking algorithms that rely on n-gram models can offer a simple solution for improving learner error detection and correction due to context-sensitivity. As the open-source speller previously available for Estonian is rule-based, our aim was to evaluate the performance of bi- and trigram-based statistical spelling correctors on an error-tagged set of A2–C1-level texts written by L2 learners of Estonian. The newly trained spell-checking models were compared to existing correction tools (open-source and commercial). Then, the best-performing Jamspell corrector was trained on various datasets to analyse their effect on the correction results.
Anthology ID:
2023.nodalida-1.79
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
782–788
Language:
URL:
https://aclanthology.org/2023.nodalida-1.79
DOI:
Bibkey:
Cite (ACL):
Kais Allkivi-Metsoja and Jaagup Kippar. 2023. Spelling Correction for Estonian Learner Language. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 782–788, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Spelling Correction for Estonian Learner Language (Allkivi-Metsoja & Kippar, NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.79.pdf