Rafael Garcia-Andujar


pdf bib
Intermediate Domain Finetuning for Weakly Supervised Domain-adaptive Clinical NER
Shilpa Suresh | Nazgol Tavabi | Shahriar Golchin | Leah Gilreath | Rafael Garcia-Andujar | Alexander Kim | Joseph Murray | Blake Bacevich | Ata Kiapour
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

Accurate human-annotated data for real-worlduse cases can be scarce and expensive to obtain. In the clinical domain, obtaining such data is evenmore difficult due to privacy concerns which notonly restrict open access to quality data but also require that the annotation be done by domain experts. In this paper, we propose a novel framework - InterDAPT - that leverages Intermediate Domain Finetuning to allow language models to adapt to narrow domains with small, noisy datasets. By making use of peripherally-related, unlabeled datasets,this framework circumvents domain-specific datascarcity issues. Our results show that this weaklysupervised framework provides performance improvements in downstream clinical named entityrecognition tasks.