Is Part-of-Speech Tagging a Solved Problem for Icelandic?

Örvar Kárason, Hrafn Loftsson


Abstract
We train and evaluate four Part-of-Speech tagging models for Icelandic. Three are older models that obtained the highest accuracy for Icelandic when they were introduced. The fourth model is of a type that currently reaches state-of-the-art accuracy. We use the most recent version of the MIM-GOLD training/testing corpus, its newest tagset, and augmentation data to obtain results that are comparable between the various models. We examine the accuracy improvements with each model and analyse the errors produced by our transformer model, which is based on a previously published ConvBERT model. For the set of errors that all the models make, and for which they predict the same tag, we extract a random subset for manual inspection. Extrapolating from this subset, we obtain a lower bound estimate on annotation errors in the corpus as well as on some unsolvable tagging errors. We argue that further tagging accuracy gains for Icelandic can still be obtained by fixing the errors in MIM-GOLD and, furthermore, that it should still be possible to squeeze out some small gains from our transformer model.
Anthology ID:
2023.nodalida-1.8
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
71–79
Language:
URL:
https://aclanthology.org/2023.nodalida-1.8
DOI:
Bibkey:
Cite (ACL):
Örvar Kárason and Hrafn Loftsson. 2023. Is Part-of-Speech Tagging a Solved Problem for Icelandic?. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 71–79, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Is Part-of-Speech Tagging a Solved Problem for Icelandic? (Kárason & Loftsson, NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.8.pdf