DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature

Anjani Dhrangadhariya, Henning Müller


Abstract
PICO recognition is an information extraction task for identifying participant, intervention, comparator, and outcome information from clinical literature. Manually identifying PICO information is the most time-consuming step for conducting systematic reviews (SR), which is already labor-intensive. A lack of diversified and large, annotated corpora restricts innovation and adoption of automated PICO recognition systems. The largest-available PICO entity/span corpus is manually annotated which is too expensive for a majority of the scientific community. To break through the bottleneck, we propose DISTANT-CTO, a novel distantly supervised PICO entity extraction approach using the clinical trials literature, to generate a massive weakly-labeled dataset with more than a million ‘Intervention’ and ‘Comparator’ entity annotations. We train distant NER (named-entity recognition) models using this weakly-labeled dataset and demonstrate that it outperforms even the sophisticated models trained on the manually annotated dataset with a 2% F1 improvement over the Intervention entity of the PICO benchmark and more than 5% improvement when combined with the manually annotated dataset. We investigate the generalizability of our approach and gain an impressive F1 score on another domain-specific PICO benchmark. The approach is not only zero-cost but is also scalable for a constant stream of PICO entity annotations.
Anthology ID:
2022.bionlp-1.34
Volume:
Proceedings of the 21st Workshop on Biomedical Language Processing
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venues:
ACL | BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
345–358
Language:
URL:
https://aclanthology.org/2022.bionlp-1.34
DOI:
10.18653/v1/2022.bionlp-1.34
Bibkey:
Cite (ACL):
Anjani Dhrangadhariya and Henning Müller. 2022. DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 345–358, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature (Dhrangadhariya & Müller, BioNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.bionlp-1.34.pdf