Khoa N.A. Duong
2022
BiomedCurator: Data Curation for Biomedical Literature
Mohammad Golam Sohrab
|
Khoa N.A. Duong
|
Ikeda Masami
|
Goran Topić
|
Yayoi Natsume-Kitatani
|
Masakata Kuroda
|
Mari Nogami Itoh
|
Hiroya Takamura
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations
We present BiomedCurator1, a web application that extracts the structured data from scientific articles in PubMed and ClinicalTrials.gov. BiomedCurator uses state-of-the-art natural language processing techniques to fill the fields pre-selected by domain experts in the relevant biomedical area. The BiomedCurator web application includes: text generation based model for relation extraction, entity detection and recognition, text classification model for extracting several fields, information retrieval from external knowledge base to retrieve IDs, and a pattern-based extraction approach that can extract several fields using regular expressions over the PubMed and ClinicalTrials.gov datasets. Evaluation results show that different approaches of BiomedCurator web application system are effective for automatic data curation in the biomedical domain.