An Analysis of Simple Data Augmentation for Named Entity Recognition

Xiang Dai, Heike Adel


Abstract
Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.
Anthology ID:
2020.coling-main.343
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3861–3867
Language:
URL:
https://aclanthology.org/2020.coling-main.343
DOI:
10.18653/v1/2020.coling-main.343
Bibkey:
Cite (ACL):
Xiang Dai and Heike Adel. 2020. An Analysis of Simple Data Augmentation for Named Entity Recognition. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3861–3867, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
An Analysis of Simple Data Augmentation for Named Entity Recognition (Dai & Adel, COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.343.pdf