OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages

Chester Palen-Michel; Maxwell Pickering; Maya Kruse; Jonne Sälevä; Constantine Lignos

doi:10.18653/v1/2025.emnlp-main.1708

OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages

Chester Palen-Michel, Maxwell Pickering, Maya Kruse, Jonne Sälevä, Constantine Lignos

Abstract

We present OpenNER 1.0, a standardized collection of openly-available named entity recognition (NER) datasets.OpenNER contains 36 NER corpora that span 52 languages, human-annotated in varying named entity ontologies.We correct annotation format issues, standardize the original datasets into a uniform representation with consistent entity type names across corpora, and provide the collection in a structure that enables research in multilingual and multi-ontology NER.We provide baseline results using three pretrained multilingual language models and two large language models to compare the performance of recent models and facilitate future research in NER.We find that no single model is best in all languages and that significant work remains to obtain high performance from LLMs on the NER task.OpenNER is released at https://github.com/bltlab/open-ner.

Anthology ID:: 2025.emnlp-main.1708
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 33649–33674
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1708/
DOI:: 10.18653/v1/2025.emnlp-main.1708
Bibkey:
Cite (ACL):: Chester Palen-Michel, Maxwell Pickering, Maya Kruse, Jonne Sälevä, and Constantine Lignos. 2025. OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33649–33674, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages (Palen-Michel et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1708.pdf
Checklist:: 2025.emnlp-main.1708.checklist.pdf

PDF Cite Search Checklist Fix data