Fine-grained Named Entity Annotation for Finnish

Jouni Luoma, Li-Hsin Chang, Filip Ginter, Sampo Pyysalo


Abstract
We introduce a corpus with fine-grained named entity annotation for Finnish, following the OntoNotes guidelines to create a resource that is cross-lingually compatible with existing annotations for other languages. We combine and extend two NER corpora recently introduced for Finnish and revise their custom annotation scheme through a combination of automatic and manual processing steps. The resulting corpus consists of nearly 500,000 tokens annotated for over 50,000 mentions categorized into the 18 OntoNotes name and numeric entity types. We evaluate this resource and demonstrate its compatibility with the English OntoNotes annotations by training state-of-the-art mono-, bi- and multilingual deep learning models, finding both that the corpus allows highly accurate recognition of OntoNotes types at 93% F-score and that a comparable level of tagging accuracy can be achieved by a bilingual Finnish-English NER model.
Anthology ID:
2021.nodalida-main.14
Volume:
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May 31--2 June
Year:
2021
Address:
Reykjavik, Iceland (Online)
Editors:
Simon Dobnik, Lilja Øvrelid
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press, Sweden
Note:
Pages:
135–144
Language:
URL:
https://aclanthology.org/2021.nodalida-main.14
DOI:
Bibkey:
Cite (ACL):
Jouni Luoma, Li-Hsin Chang, Filip Ginter, and Sampo Pyysalo. 2021. Fine-grained Named Entity Annotation for Finnish. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 135–144, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Cite (Informal):
Fine-grained Named Entity Annotation for Finnish (Luoma et al., NoDaLiDa 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nodalida-main.14.pdf
Data
FinerOntoNotes 5.0