Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Stephen Mayhew; Terra Blevins; Shuheng Liu; Marek Šuppa; Hila Gonen; Joseph Marvin Imperial; Börje Karlsson; Peiqin Lin; Nikola Ljubešić; Lester James Miranda; Barbara Plank; Arij Riabi; Yuval Pinter

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Suppa, Hila Gonen, Joseph Marvin Imperial, Börje Karlsson, Peiqin Lin, Nikola Ljubešić, Lester James Miranda, Barbara Plank, Arij Riabi, Yuval Pinter

Abstract

We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 19 datasets annotated with named entities in a cross-lingual consistent schema across 13 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We will release the data, code, and fitted models to the public.

Anthology ID:: 2024.naacl-long.243
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4322–4337
Language:
URL:: https://aclanthology.org/2024.naacl-long.243
DOI:
Bibkey:
Cite (ACL):: Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Suppa, Hila Gonen, Joseph Marvin Imperial, Börje Karlsson, Peiqin Lin, Nikola Ljubešić, Lester James Miranda, Barbara Plank, Arij Riabi, and Yuval Pinter. 2024. Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4322–4337, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark (Mayhew et al., NAACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.naacl-long.243.pdf

PDF Cite Search