Shilpa Suresh


2023

pdf bib
Intermediate Domain Finetuning for Weakly Supervised Domain-adaptive Clinical NER
Shilpa Suresh | Nazgol Tavabi | Shahriar Golchin | Leah Gilreath | Rafael Garcia-Andujar | Alexander Kim | Joseph Murray | Blake Bacevich | Ata Kiapour
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

Accurate human-annotated data for real-worlduse cases can be scarce and expensive to obtain. In the clinical domain, obtaining such data is evenmore difficult due to privacy concerns which notonly restrict open access to quality data but also require that the annotation be done by domain experts. In this paper, we propose a novel framework - InterDAPT - that leverages Intermediate Domain Finetuning to allow language models to adapt to narrow domains with small, noisy datasets. By making use of peripherally-related, unlabeled datasets,this framework circumvents domain-specific datascarcity issues. Our results show that this weaklysupervised framework provides performance improvements in downstream clinical named entityrecognition tasks.

2021

pdf bib
Improved Latent Tree Induction with Distant Supervision via Span Constraints
Zhiyang Xu | Andrew Drozdov | Jay Yoon Lee | Tim O’Gorman | Subendhu Rongali | Dylan Finkbeiner | Shilpa Suresh | Mohit Iyyer | Andrew McCallum
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

For over thirty years, researchers have developed and analyzed methods for latent tree induction as an approach for unsupervised syntactic parsing. Nonetheless, modern systems still do not perform well enough compared to their supervised counterparts to have any practical use as structural annotation of text. In this work, we present a technique that uses distant supervision in the form of span constraints (i.e. phrase bracketing) to improve performance in unsupervised constituency parsing. Using a relatively small number of span constraints we can substantially improve the output from DIORA, an already competitive unsupervised parsing system. Compared with full parse tree annotation, span constraints can be acquired with minimal effort, such as with a lexicon derived from Wikipedia, to find exact text matches. Our experiments show span constraints based on entities improves constituency parsing on English WSJ Penn Treebank by more than 5 F1. Furthermore, our method extends to any domain where span constraints are easily attainable, and as a case study we demonstrate its effectiveness by parsing biomedical text from the CRAFT dataset.