An Analysis of Simple Data Augmentation for Named Entity Recognition

Xiang Dai, Heike Adel


Abstract
Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.
Anthology ID:
2020.coling-main.343
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3861–3867
Language:
URL:
https://aclanthology.org/2020.coling-main.343
DOI:
10.18653/v1/2020.coling-main.343
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.343.pdf