MiDRED: An Annotated Corpus for Microbiome Knowledge Base Construction

William Hogan; Andrew Bartko; Jingbo Shang; Chun-nan Hsu

doi:10.18653/v1/2024.bionlp-1.31

MiDRED: An Annotated Corpus for Microbiome Knowledge Base Construction

William Hogan, Andrew Bartko, Jingbo Shang, Chun-Nan Hsu

Abstract

The interplay between microbiota and diseases has emerged as a significant area of research facilitated by the proliferation of cost-effective and precise sequencing technologies. To keep track of the many findings, domain experts manually review publications to extract reported microbe-disease associations and compile them into knowledge bases. However, manual curation efforts struggle to keep up with the pace of publications. Relation extraction has demonstrated remarkable success in other domains, yet the availability of datasets supporting such methods within the domain of microbiome research remains limited. To bridge this gap, we introduce the Microbe-Disease Relation Extraction Dataset (MiDRED); a human-annotated dataset containing 3,116 annotations of fine-grained relationships between microbes and diseases. We hope this dataset will help address the scarcity of data in this crucial domain and facilitate the development of advanced text-mining solutions to automate the creation and maintenance of microbiome knowledge bases.

Anthology ID:: 2024.bionlp-1.31
Volume:: Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:: SIGBIOMED
Publisher:: Association for Computational Linguistics
Note:
Pages:: 398–408
Language:
URL:: https://aclanthology.org/2024.bionlp-1.31/
DOI:: 10.18653/v1/2024.bionlp-1.31
Bibkey:
Cite (ACL):: William Hogan, Andrew Bartko, Jingbo Shang, and Chun-Nan Hsu. 2024. MiDRED: An Annotated Corpus for Microbiome Knowledge Base Construction. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, pages 398–408, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: MiDRED: An Annotated Corpus for Microbiome Knowledge Base Construction (Hogan et al., BioNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.bionlp-1.31.pdf

PDF Cite Search Fix data