In-Domain Pre-Training Improves Clinical Note Generation from Doctor-Patient Conversations

Colin Grambow, Longxiang Zhang, Thomas Schaaf


Abstract
Summarization of doctor-patient conversations into clinical notes by medical scribes is an essential process for effective clinical care. Pre-trained transformer models have shown a great amount of success in this area, but the domain shift from standard NLP tasks to the medical domain continues to present challenges. We build upon several recent works to show that additional pre-training with in-domain medical conversations leads to performance gains for clinical summarization. In addition to conventional evaluation metrics, we also explore a clinical named entity recognition model for concept-based evaluation. Finally, we contrast long-sequence transformers with a common transformer model, BART. Overall, our findings corroborate research in non-medical domains and suggest that in-domain pre-training combined with transformers for long sequences are effective strategies for summarizing clinical encounters.
Anthology ID:
2022.nlg4health-1.2
Volume:
Proceedings of the First Workshop on Natural Language Generation in Healthcare
Month:
July
Year:
2022
Address:
Waterville, Maine, USA and virtual meeting
Editors:
Emiel Krahmer, Kathy McCoy, Ehud Reiter
Venue:
NLG4Health
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
9–22
Language:
URL:
https://aclanthology.org/2022.nlg4health-1.2
DOI:
Bibkey:
Cite (ACL):
Colin Grambow, Longxiang Zhang, and Thomas Schaaf. 2022. In-Domain Pre-Training Improves Clinical Note Generation from Doctor-Patient Conversations. In Proceedings of the First Workshop on Natural Language Generation in Healthcare, pages 9–22, Waterville, Maine, USA and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
In-Domain Pre-Training Improves Clinical Note Generation from Doctor-Patient Conversations (Grambow et al., NLG4Health 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nlg4health-1.2.pdf