Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task

Marek Suppa, Ondrej Jariabka


Abstract
In this paper we describe TraSpaS, a submission to the third shared task on named entity recognition hosted as part of the Balto-Slavic Natural Language Processing (BSNLP) Workshop. In it we evaluate various pre-trained language models on the NER task using three open-source NLP toolkits: character level language model with Stanza, language-specific BERT-style models with SpaCy and Adapter-enabled XLM-R with Trankit. Our results show that the Trankit-based models outperformed those based on the other two toolkits, even when trained on smaller amounts of data. Our code is available at https://github.com/NaiveNeuron/slavner-2021.
Anthology ID:
2021.bsnlp-1.13
Volume:
Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing
Month:
April
Year:
2021
Address:
Kiyv, Ukraine
Editors:
Bogdan Babych, Olga Kanishcheva, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Vasyl Starko, Josef Steinberger, Roman Yangarber, Michał Marcińczuk, Senja Pollak, Pavel Přibáň, Marko Robnik-Šikonja
Venue:
BSNLP
SIG:
SIGSLAV
Publisher:
Association for Computational Linguistics
Note:
Pages:
105–114
Language:
URL:
https://aclanthology.org/2021.bsnlp-1.13
DOI:
Bibkey:
Cite (ACL):
Marek Suppa and Ondrej Jariabka. 2021. Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task. In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, pages 105–114, Kiyv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task (Suppa & Jariabka, BSNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.bsnlp-1.13.pdf
Code
 naiveneuron/slavner-2021