BiomedCurator: Data Curation for Biomedical Literature

Mohammad Golam Sohrab, Khoa N.A. Duong, Ikeda Masami, Goran Topić, Yayoi Natsume-Kitatani, Masakata Kuroda, Mari Nogami Itoh, Hiroya Takamura


Abstract
We present BiomedCurator1, a web application that extracts the structured data from scientific articles in PubMed and ClinicalTrials.gov. BiomedCurator uses state-of-the-art natural language processing techniques to fill the fields pre-selected by domain experts in the relevant biomedical area. The BiomedCurator web application includes: text generation based model for relation extraction, entity detection and recognition, text classification model for extracting several fields, information retrieval from external knowledge base to retrieve IDs, and a pattern-based extraction approach that can extract several fields using regular expressions over the PubMed and ClinicalTrials.gov datasets. Evaluation results show that different approaches of BiomedCurator web application system are effective for automatic data curation in the biomedical domain.
Anthology ID:
2022.aacl-demo.8
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations
Month:
November
Year:
2022
Address:
Taipei, Taiwan
Editors:
Wray Buntine, Maria Liakata
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
63–71
Language:
URL:
https://aclanthology.org/2022.aacl-demo.8
DOI:
Bibkey:
Cite (ACL):
Mohammad Golam Sohrab, Khoa N.A. Duong, Ikeda Masami, Goran Topić, Yayoi Natsume-Kitatani, Masakata Kuroda, Mari Nogami Itoh, and Hiroya Takamura. 2022. BiomedCurator: Data Curation for Biomedical Literature. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations, pages 63–71, Taipei, Taiwan. Association for Computational Linguistics.
Cite (Informal):
BiomedCurator: Data Curation for Biomedical Literature (Sohrab et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.aacl-demo.8.pdf