Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts

Uyen Phan, Nhung Nguyen


Abstract
Data augmentation is important in addressing data sparsity and low resources in NLP. Unlike data augmentation for other tasks such as sentence-level and sentence-pair ones, data augmentation for named entity recognition (NER) requires preserving the semantic of entities. To that end, in this paper we propose a simple semantic-based data augmentation method for biomedical NER. Our method leverages semantic information from pre-trained language models for both entity-level and sentence-level. Experimental results on two datasets: i2b2-2010 (English) and VietBioNER (Vietnamese) showed that the proposed method could improve NER performance.
Anthology ID:
2022.bionlp-1.12
Volume:
Proceedings of the 21st Workshop on Biomedical Language Processing
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
123–129
Language:
URL:
https://aclanthology.org/2022.bionlp-1.12
DOI:
10.18653/v1/2022.bionlp-1.12
Bibkey:
Cite (ACL):
Uyen Phan and Nhung Nguyen. 2022. Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 123–129, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts (Phan & Nguyen, BioNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.bionlp-1.12.pdf
Video:
 https://aclanthology.org/2022.bionlp-1.12.mp4