HealthAlignSumm : Utilizing Alignment for Multimodal Summarization of Code-Mixed Healthcare Dialogues

Akash Ghosh; Arkadeep Acharya; Sriparna Saha; Gaurav Pandey; Dinesh Raghu; Setu Sinha

HealthAlignSumm : Utilizing Alignment for Multimodal Summarization of Code-Mixed Healthcare Dialogues

Akash Ghosh, Arkadeep Acharya, Sriparna Saha, Gaurav Pandey, Dinesh Raghu, Setu Sinha

Abstract

As generative AI progresses, collaboration be-tween doctors and AI scientists is leading to thedevelopment of personalized models to stream-line healthcare tasks and improve productivity.Summarizing doctor-patient dialogues has be-come important, helping doctors understandconversations faster and improving patient care.While previous research has mostly focused ontext data, incorporating visual cues from pa-tient interactions allows doctors to gain deeperinsights into medical conditions. Most of thisresearch has centered on English datasets, butreal-world conversations often mix languagesfor better communication. To address the lackof resources for multimodal summarization ofcode-mixed dialogues in healthcare, we devel-oped the MCDH dataset. Additionally, we cre-ated HealthAlignSumm, a new model that in-tegrates visual components with the BART ar-chitecture. This represents a key advancementin multimodal fusion, applied within both theencoder and decoder of the BART model. Ourwork is the first to use alignment techniques,including state-of-the-art algorithms like DirectPreference Optimization, on encoder-decodermodels with synthetic datasets for multimodalsummarization. Through extensive experi-ments, we demonstrated the superior perfor-mance of HealthAlignSumm across severalmetrics validated by both automated assess-ments and human evaluations. The datasetMCDH and our proposed model HealthAlign-Summ will be available in this GitHub accounthttps://github.com/AkashGhosh/HealthAlignSumm-Utilizing-Alignment-for-Multimodal-Summarization-of-Code-Mixed-Healthcare-DialoguesDisclaimer: This work involves medical im-agery based on the subject matter of the topic.

Anthology ID:: 2024.findings-emnlp.675
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11546–11560
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.675
DOI:
Bibkey:
Cite (ACL):: Akash Ghosh, Arkadeep Acharya, Sriparna Saha, Gaurav Pandey, Dinesh Raghu, and Setu Sinha. 2024. HealthAlignSumm : Utilizing Alignment for Multimodal Summarization of Code-Mixed Healthcare Dialogues. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 11546–11560, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: HealthAlignSumm : Utilizing Alignment for Multimodal Summarization of Code-Mixed Healthcare Dialogues (Ghosh et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.675.pdf

PDF Cite Search