A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition

Yuxuan Chen; Jonas Mikkelsen; Arne Binder; Christoph Alt; Leonhard Hennig

doi:10.18653/v1/2022.repl4nlp-1.6

A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition

Yuxuan Chen, Jonas Mikkelsen, Arne Binder, Christoph Alt, Leonhard Hennig

Abstract

Pre-trained language models (PLM) are effective components of few-shot named entity recognition (NER) approaches when augmented with continued pre-training on task-specific out-of-domain data or fine-tuning on in-domain data. However, their performance in low-resource scenarios, where such data is not available, remains an open question. We introduce an encoder evaluation framework, and use it to systematically compare the performance of state-of-the-art pre-trained representations on the task of low-resource NER. We analyze a wide range of encoders pre-trained with different strategies, model architectures, intermediate-task fine-tuning, and contrastive learning. Our experimental results across ten benchmark NER datasets in English and German show that encoder performance varies significantly, suggesting that the choice of encoder for a specific low-resource scenario needs to be carefully evaluated.

Anthology ID:: 2022.repl4nlp-1.6
Volume:: Proceedings of the 7th Workshop on Representation Learning for NLP
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Spandana Gella, He He, Bodhisattwa Prasad Majumder, Burcu Can, Eleonora Giunchiglia, Samuel Cahyawijaya, Sewon Min, Maximilian Mozes, Xiang Lorraine Li, Isabelle Augenstein, Anna Rogers, Kyunghyun Cho, Edward Grefenstette, Laura Rimell, Chris Dyer
Venue:: RepL4NLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 46–59
Language:
URL:: https://aclanthology.org/2022.repl4nlp-1.6
DOI:: 10.18653/v1/2022.repl4nlp-1.6
Bibkey:
Cite (ACL):: Yuxuan Chen, Jonas Mikkelsen, Arne Binder, Christoph Alt, and Leonhard Hennig. 2022. A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition. In Proceedings of the 7th Workshop on Representation Learning for NLP, pages 46–59, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition (Chen et al., RepL4NLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.repl4nlp-1.6.pdf
Video:: https://aclanthology.org/2022.repl4nlp-1.6.mp4
Code: dfki-nlp/fewie
Data: CoNLL 2003, Few-NERD, WNUT 2017

PDF Cite Search Code Video