NU_DeepHealthNLP at #SMM4H-HeaRD 2026: Entity-Conditioned Generation and a Four-Stage Pipeline for Automated SOAP Note Generation

Thanya Mysore Santhosh; Deahan Yu

NU_DeepHealthNLP at #SMM4H-HeaRD 2026: Entity-Conditioned Generation and a Four-Stage Pipeline for Automated SOAP Note Generation

Abstract

We describe two system submissions to Task 4 of the SMM4H-HeaRD 2026 Shared Task on automated SOAP note generation from doctor–patient dialogues. Our first submission is a standalone entity-conditioned generation model: Mistral-7B-Instruct-v0.1 fine-tuned with QLoRA on 8,529 MedSynth training dialogues, where both training and inference prompts include clinical entities extracted and grouped by SOAP section. Our second submission is a four-stage modular pipeline that additionally incorporates a hybrid retrieval stage and a rule-based verification stage. The key finding of this work is that incorporating structured clinical domain knowledge, in the form of NER entities grouped by SOAP section, directly into the generation prompt produces consistent and reliable improvements over dialogue-only generation. Our four-stage pipeline submission achieved an average score of 0.54 on the official test set, ranking first on the shared task leaderboard.

Anthology ID:: 2026.smm4h-1.17
Volume:: Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Month:: July
Year:: 2026
Address:: San Diego, United States
Editors:: Guillermo Lopez-Garcia, Graciela Gonzalez-Hernandez
Venues:: SMM4H | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 103–107
Language:
URL:: https://aclanthology.org/2026.smm4h-1.17/
DOI:
Bibkey:
Cite (ACL):: Thanya Mysore Santhosh and Deahan Yu. 2026. NU_DeepHealthNLP at #SMM4H-HeaRD 2026: Entity-Conditioned Generation and a Four-Stage Pipeline for Automated SOAP Note Generation. In Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks, pages 103–107, San Diego, United States. Association for Computational Linguistics.
Cite (Informal):: NU_DeepHealthNLP at #SMM4H-HeaRD 2026: Entity-Conditioned Generation and a Four-Stage Pipeline for Automated SOAP Note Generation (Santhosh & Yu, SMM4H 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.smm4h-1.17.pdf

PDF Cite Search Fix data