BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali)

Salim Sazzed

doi:10.18653/v1/2022.bionlp-1.31

BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali)

Abstract

Recognizing biomedical entities in the text has significance in biomedical and health science research, as it benefits myriad downstream tasks, including entity linking, relation extraction, or entity resolution. While English and a few other widely used languages enjoy ample resources for automatic biomedical entity recognition, it is not the case for Bangla, a low-resource language. On that account, in this paper, we introduce BanglaBioMed, a Bangla biomedical named entity (NE) annotated dataset in standard IOB format, the first of its kind, consisting of over 12000 tokens annotated with the biomedical entities. The corpus is created by collecting Bangla text from a list of health articles and then annotated with four distinct types of entities: Anatomy (AN), Chemical and Drugs (CD), Disease and Symptom (DS), and Medical Procedure (MP). We provide the details of the entire data collection and annotation procedure and illustrate various statistics of the created corpus. Our developed corpus is a much-needed addition to the Bangla NLP resource that will facilitate biomedical NLP research in Bangla.

Anthology ID:: 2022.bionlp-1.31
Volume:: Proceedings of the 21st Workshop on Biomedical Language Processing
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:: BioNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 323–329
Language:
URL:: https://aclanthology.org/2022.bionlp-1.31/
DOI:: 10.18653/v1/2022.bionlp-1.31
Bibkey:
Cite (ACL):: Salim Sazzed. 2022. BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali). In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 323–329, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali) (Sazzed, BioNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.bionlp-1.31.pdf
Video:: https://aclanthology.org/2022.bionlp-1.31.mp4

PDF Cite Search Video Fix data