WikiGoldSK: Annotated Dataset, Baselines and Few-Shot Learning Experiments for Slovak Named Entity Recognition

David Suba, Marek Suppa, Jozef Kubik, Endre Hamerlik, Martin Takac


Abstract
Named Entity Recognition (NER) is a fundamental NLP tasks with a wide range of practical applications. The performance of state-of-the-art NER methods depends on high quality manually anotated datasets which still do not exist for some languages. In this work we aim to remedy this situation in Slovak by introducing WikiGoldSK, the first sizable human labelled Slovak NER dataset. We benchmark it by evaluating state-of-the-art multilingual Pretrained Language Models and comparing it to the existing silver-standard Slovak NER dataset. We also conduct few-shot experiments and show that training on a sliver-standard dataset yields better results. To enable future work that can be based on Slovak NER, we release the dataset, code, as well as the trained models publicly under permissible licensing terms at https://github.com/NaiveNeuron/WikiGoldSK
Anthology ID:
2023.bsnlp-1.16
Volume:
Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Jakub Piskorski, Michał Marcińczuk, Preslav Nakov, Maciej Ogrodniczuk, Senja Pollak, Pavel Přibáň, Piotr Rybak, Josef Steinberger, Roman Yangarber
Venue:
BSNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
138–145
Language:
URL:
https://aclanthology.org/2023.bsnlp-1.16
DOI:
10.18653/v1/2023.bsnlp-1.16
Bibkey:
Cite (ACL):
David Suba, Marek Suppa, Jozef Kubik, Endre Hamerlik, and Martin Takac. 2023. WikiGoldSK: Annotated Dataset, Baselines and Few-Shot Learning Experiments for Slovak Named Entity Recognition. In Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023), pages 138–145, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
WikiGoldSK: Annotated Dataset, Baselines and Few-Shot Learning Experiments for Slovak Named Entity Recognition (Suba et al., BSNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.bsnlp-1.16.pdf
Video:
 https://aclanthology.org/2023.bsnlp-1.16.mp4