Vivian Villegas


2024

pdf bib
CareCorpus+: Expanding and Augmenting Caregiver Strategy Data to Support Pediatric Rehabilitation
Shahla Farzana | Ivana Lucero | Vivian Villegas | Vera C Kaelin | Mary Khetani | Natalie Parde
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Caregiver strategy classification in pediatric rehabilitation contexts is strongly motivated by real-world clinical constraints but highly under-resourced and seldom studied in natural language processing settings. We introduce a large dataset of 4,037 caregiver strategies in this setting, a five-fold increase over the nearest contemporary dataset. These strategies are manually categorized into clinically established constructs with high agreement (𝜅=0.68-0.89). We also propose two techniques to further address identified data constraints. First, we manually supplement target task data with publicly relevant data from online child health forums. Next, we propose a novel data augmentation technique to generate synthetic caregiver strategies with high downstream task utility. Extensive experiments showcase the quality of our dataset. They also establish evidence that both the publicly available data and the synthetic strategies result in large performance gains, with relative F1 increases of 22.6% and 50.9%, respectively.