Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction from Scientific Papers: A Benchmark Dataset and Models

Sidhant Chandak, Liqing Zhang, Connor Brown, Lifu Huang


Abstract
Antibiotic resistance has become a growing worldwide concern as new resistance mechanisms are emerging and spreading globally, and thus detecting and collecting the cause – Antibiotic Resistance Genes (ARGs), have been more critical than ever. In this work, we aim to automate the curation of ARGs by extracting ARG-related assertive statements from scientific papers. To support the research towards this direction, we build SciARG, a new benchmark dataset containing 2,000 manually annotated statements as the evaluation set and 12,516 silver-standard training statements that are automatically created from scientific papers by a set of rules. To set up the baseline performance on SciARG, we exploit three state-of-the-art neural architectures based on pre-trained language models and prompt tuning, and further ensemble them to attain the highest 77.0% F-score. To the best of our knowledge, we are the first to leverage natural language processing techniques to curate all validated ARGs from scientific papers. Both the code and data are publicly available at https://github.com/VT-NLP/SciARG.
Anthology ID:
2022.bionlp-1.40
Volume:
Proceedings of the 21st Workshop on Biomedical Language Processing
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
402–411
Language:
URL:
https://aclanthology.org/2022.bionlp-1.40
DOI:
10.18653/v1/2022.bionlp-1.40
Bibkey:
Cite (ACL):
Sidhant Chandak, Liqing Zhang, Connor Brown, and Lifu Huang. 2022. Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction from Scientific Papers: A Benchmark Dataset and Models. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 402–411, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction from Scientific Papers: A Benchmark Dataset and Models (Chandak et al., BioNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.bionlp-1.40.pdf
Video:
 https://aclanthology.org/2022.bionlp-1.40.mp4
Code
 vt-nlp/sciarg