Evaluating Robustness of Open Dialogue Summarization Models in the Presence of Naturally Occurring Variations

Ankita Gupta, Chulaka Gunasekara, Hui Wan, Jatin Ganhotra, Sachindra Joshi, Marina Danilevsky


Abstract
Dialogue summarization involves summarizing long conversations while preserving the most salient information. Real-life dialogues often involve naturally occurring variations (e.g., repetitions, hesitations). In this study, we systematically investigate the impact of such variations on state-of-the-art open dialogue summarization models whose details are publicly known (e.g., architectures, weights, and training corpora). To simulate real-life variations, we introduce two types of perturbations: utterance-level perturbations that modify individual utterances with errors and language variations, and dialogue-level perturbations that add non-informative exchanges (e.g., repetitions, greetings). We perform our analysis along three dimensions of robustness: consistency, saliency, and faithfulness, which aim to capture different aspects of performance of a summarization model. We find that both fine-tuned and instruction-tuned models are affected by input variations, with the latter being more susceptible, particularly to dialogue-level perturbations. We also validate our findings via human evaluation. Finally, we investigate whether the robustness of fine-tuned models can be improved by training them with a fraction of perturbed data. We find that this approach does not yield consistent performance gains, warranting further research. Overall, our work highlights robustness challenges in current open encoder-decoder summarization models and provides insights for future research.
Anthology ID:
2024.nlp4convai-1.4
Original:
2024.nlp4convai-1.4v1
Version 2:
2024.nlp4convai-1.4v2
Volume:
Proceedings of the 6th Workshop on NLP for Conversational AI (NLP4ConvAI 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Elnaz Nouri, Abhinav Rastogi, Georgios Spithourakis, Bing Liu, Yun-Nung Chen, Yu Li, Alon Albalak, Hiromi Wakaki, Alexandros Papangelis
Venues:
NLP4ConvAI | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
56–72
Language:
URL:
https://aclanthology.org/2024.nlp4convai-1.4
DOI:
Bibkey:
Cite (ACL):
Ankita Gupta, Chulaka Gunasekara, Hui Wan, Jatin Ganhotra, Sachindra Joshi, and Marina Danilevsky. 2024. Evaluating Robustness of Open Dialogue Summarization Models in the Presence of Naturally Occurring Variations. In Proceedings of the 6th Workshop on NLP for Conversational AI (NLP4ConvAI 2024), pages 56–72, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Evaluating Robustness of Open Dialogue Summarization Models in the Presence of Naturally Occurring Variations (Gupta et al., NLP4ConvAI-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4convai-1.4.pdf