2024
pdf
bib
abs
HealthAlignSumm : Utilizing Alignment for Multimodal Summarization of Code-Mixed Healthcare Dialogues
Akash Ghosh
|
Arkadeep Acharya
|
Sriparna Saha
|
Gaurav Pandey
|
Dinesh Raghu
|
Setu Sinha
Findings of the Association for Computational Linguistics: EMNLP 2024
As generative AI progresses, collaboration be-tween doctors and AI scientists is leading to thedevelopment of personalized models to stream-line healthcare tasks and improve productivity.Summarizing doctor-patient dialogues has be-come important, helping doctors understandconversations faster and improving patient care.While previous research has mostly focused ontext data, incorporating visual cues from pa-tient interactions allows doctors to gain deeperinsights into medical conditions. Most of thisresearch has centered on English datasets, butreal-world conversations often mix languagesfor better communication. To address the lackof resources for multimodal summarization ofcode-mixed dialogues in healthcare, we devel-oped the MCDH dataset. Additionally, we cre-ated HealthAlignSumm, a new model that in-tegrates visual components with the BART ar-chitecture. This represents a key advancementin multimodal fusion, applied within both theencoder and decoder of the BART model. Ourwork is the first to use alignment techniques,including state-of-the-art algorithms like DirectPreference Optimization, on encoder-decodermodels with synthetic datasets for multimodalsummarization. Through extensive experi-ments, we demonstrated the superior perfor-mance of HealthAlignSumm across severalmetrics validated by both automated assess-ments and human evaluations. The datasetMCDH and our proposed model HealthAlign-Summ will be available in this GitHub accounthttps://github.com/AkashGhosh/HealthAlignSumm-Utilizing-Alignment-for-Multimodal-Summarization-of-Code-Mixed-Healthcare-DialoguesDisclaimer: This work involves medical im-agery based on the subject matter of the topic.
2022
pdf
bib
abs
Gaining Insights into Unrecognized User Utterances in Task-Oriented Dialog Systems
Ella Rabinovich
|
Matan Vetzler
|
David Boaz
|
Vineet Kumar
|
Gaurav Pandey
|
Ateret Anaby Tavor
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
The rapidly growing market demand for automatic dialogue agents capable of goal-oriented behavior has caused many tech-industry leaders to invest considerable efforts into task-oriented dialog systems. The success of these systems is highly dependent on the accuracy of their intent identification – the process of deducing the goal or meaning of the user’s request and mapping it to one of the known intents for further processing. Gaining insights into unrecognized utterances – user requests the systems fails to attribute to a known intent – is therefore a key process in continuous improvement of goal-oriented dialog systems. We present an end-to-end pipeline for processing unrecognized user utterances, deployed in a real-world, commercial task-oriented dialog system, including a specifically-tailored clustering algorithm, a novel approach to cluster representative extraction, and cluster naming. We evaluated the proposed components, demonstrating their benefits in the analysis of unrecognized user requests.
pdf
bib
abs
Mix-and-Match: Scalable Dialog Response Retrieval using Gaussian Mixture Embeddings
Gaurav Pandey
|
Danish Contractor
|
Sachindra Joshi
Findings of the Association for Computational Linguistics: EMNLP 2022
Embedding-based approaches for dialog response retrieval embed the context-response pairs as points in the embedding space. These approaches are scalable, but fail to account for the complex, many-to-many relationships that exist between context-response pairs. On the other end of the spectrum, there are approaches that feed the context-response pairs jointly through multiple layers of neural networks. These approaches can model the complex relationships between context-response pairs, but fail to scale when the set of responses is moderately large (>1000). In this paper, we propose a scalable model that can learn complex relationships between context-response pairs. Specifically, the model maps the contexts as well as responses to probability distributions over the embedding space. We train the models by optimizing the Kullback-Leibler divergence between the distributions induced by context-response pairs in the training data. We show that the resultant model achieves better performance as compared to other embedding-based approaches on publicly available conversation data.
2021
pdf
bib
abs
Simulated Chats for Building Dialog Systems: Learning to Generate Conversations from Instructions
Biswesh Mohapatra
|
Gaurav Pandey
|
Danish Contractor
|
Sachindra Joshi
Findings of the Association for Computational Linguistics: EMNLP 2021
Popular dialog datasets such as MultiWOZ are created by providing crowd workers an instruction, expressed in natural language, that describes the task to be accomplished. Crowd workers play the role of a user and an agent to generate dialogs to accomplish tasks involving booking restaurant tables, calling a taxi etc. In this paper, we present a data creation strategy that uses the pre-trained language model, GPT2, to simulate the interaction between crowd workers by creating a user bot and an agent bot. We train the simulators using a smaller percentage of actual crowd-generated conversations and their corresponding instructions. We demonstrate that by using the simulated data, we achieve significant improvements in low-resource settings on two publicly available datasets - MultiWOZ dataset and the Persona chat dataset.
2020
pdf
bib
abs
Agent Assist through Conversation Analysis
Kshitij Fadnis
|
Nathaniel Mills
|
Jatin Ganhotra
|
Haggai Roitman
|
Gaurav Pandey
|
Doron Cohen
|
Yosi Mass
|
Shai Erera
|
Chulaka Gunasekara
|
Danish Contractor
|
Siva Patel
|
Q. Vera Liao
|
Sachindra Joshi
|
Luis Lastras
|
David Konopnicki
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Customer support agents play a crucial role as an interface between an organization and its end-users. We propose CAIRAA: Conversational Approach to Information Retrieval for Agent Assistance, to reduce the cognitive workload of support agents who engage with users through conversation systems. CAIRAA monitors an evolving conversation and recommends both responses and URLs of documents the agent can use in replies to their client. We combine traditional information retrieval (IR) approaches with more recent Deep Learning (DL) models to ensure high accuracy and efficient run-time performance in the deployed system. Here, we describe the CAIRAA system and demonstrate its effectiveness in a pilot study via a short video.
2018
pdf
bib
abs
Exemplar Encoder-Decoder for Neural Conversation Generation
Gaurav Pandey
|
Danish Contractor
|
Vineet Kumar
|
Sachindra Joshi
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In this paper we present the Exemplar Encoder-Decoder network (EED), a novel conversation model that learns to utilize similar examples from training data to generate responses. Similar conversation examples (context-response pairs) from training data are retrieved using a traditional TF-IDF based retrieval model and the corresponding responses are used by our decoder to generate the ground truth response. The contribution of each retrieved response is weighed by the similarity of corresponding context with the input context. As a result, our model learns to assign higher similarity scores to those retrieved contexts whose responses are crucial for generating the final response. We present detailed experiments on two large data sets and we find that our method out-performs state of the art sequence to sequence generative models on several recently proposed evaluation metrics.