Investigating the effectiveness of Data Augmentation and Contrastive Learning for Named Entity Recognition

Noel Chia; Ines Rehbein; Simone Paolo Ponzetto

Investigating the effectiveness of Data Augmentation and Contrastive Learning for Named Entity Recognition

Noel Chia, Ines Rehbein, Simone Paolo Ponzetto

Abstract

Data Augmentation (DA) and Contrastive Learning (CL) are widely used in NLP, but their potential for NER has not yet been investigated in detail. Existing work is mostly limited to zero- and few-shot scenarios where improvements over the baseline are easy to obtain. In this paper, we address this research gap by presenting a systematic evaluation of DA for NER on small, medium-sized and large datasets with coarse and fine-grained labels. We report results for a) DA only, b) DA in combination with supervised contrastive learning, and c) DA with transfer learning. Our results show that DA on its own fails to improve results over the baseline and that supervised CL works better on larger datasets while transfer learning is beneficial if the target dataset is very small. Finally, we investigate how contrastive learning affects the learned representations, based on dimensionality reduction and visualisation techniques, and show that CL mostly helps to separate named entities from non-entities.

Anthology ID:: 2025.nodalida-1.8
Volume:: Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)
Month:: march
Year:: 2025
Address:: Tallinn, Estonia
Editors:: Richard Johansson, Sara Stymne
Venue:: NoDaLiDa
SIG:
Publisher:: University of Tartu Library
Note:
Pages:: 66–79
Language:
URL:: https://aclanthology.org/2025.nodalida-1.8/
DOI:
Bibkey:
Cite (ACL):: Noel Chia, Ines Rehbein, and Simone Paolo Ponzetto. 2025. Investigating the effectiveness of Data Augmentation and Contrastive Learning for Named Entity Recognition. In Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), pages 66–79, Tallinn, Estonia. University of Tartu Library.
Cite (Informal):: Investigating the effectiveness of Data Augmentation and Contrastive Learning for Named Entity Recognition (Chia et al., NoDaLiDa 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.nodalida-1.8.pdf

PDF Cite Search Fix data