Playing by the Rules: A Benchmark Set for Standardized Icelandic Orthography

Bjarki Ármannsson, Hinrik Hafsteinsson, Jóhannes B. Sigtryggsson, Atli Jasonarson, Einar Freyr Sigurðsson, Steinþór Steingrímsson


Abstract
We present the Icelandic Standardization Benchmark Set: Spelling and Punctuation (IceStaBS:SP), a dataset designed to provide standardized text examples for Icelandic orthography. The dataset includes non-standard orthography examples and their standardized counterparts, along with detailed explanations based on official Icelandic spelling rules. IceStaBS:SP aims to support the development and evaluation of automatic spell and grammar checkers, particularly in educational settings. We evaluate various spell and grammar checkers using IceStaBS:SP, demonstrating its utility as a benchmarking tool and highlighting areas for future improvement.
Anthology ID:
2025.nodalida-1.4
Volume:
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)
Month:
march
Year:
2025
Address:
Tallinn, Estonia
Editors:
Richard Johansson, Sara Stymne
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
28–36
Language:
URL:
https://aclanthology.org/2025.nodalida-1.4/
DOI:
Bibkey:
Cite (ACL):
Bjarki Ármannsson, Hinrik Hafsteinsson, Jóhannes B. Sigtryggsson, Atli Jasonarson, Einar Freyr Sigurðsson, and Steinþór Steingrímsson. 2025. Playing by the Rules: A Benchmark Set for Standardized Icelandic Orthography. In Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), pages 28–36, Tallinn, Estonia. University of Tartu Library.
Cite (Informal):
Playing by the Rules: A Benchmark Set for Standardized Icelandic Orthography (Ármannsson et al., NoDaLiDa 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.nodalida-1.4.pdf