Investigating the Role and Impact of Disfluency on Summarization

Varun Nathan, Ayush Kumar, Jithendra Vepa


Abstract
Contact centers handle both chat and voice calls for the same domain. As part of their workflow, it is a standard practice to summarize the conversations once they conclude. A significant distinction between chat and voice communication lies in the presence of disfluencies in voice calls, such as repetitions, restarts, and replacements. These disfluencies are generally considered noise for downstream natural language understanding (NLU) tasks. While a separate summarization model for voice calls can be trained in addition to chat specific model for the same domain, it requires manual annotations for both the channels and adds complexity arising due to maintaining two models. Therefore, it’s crucial to investigate if a model trained on fluent data can handle disfluent data effectively. While previous research explored impact of disfluency on question-answering and intent detection, its influence on summarization is inadequately studied. Our experiments reveal up to 6.99-point degradation in Rouge-L score, along with reduced fluency, consistency, and relevance when a fluent-trained model handles disfluent data. Replacement disfluencies have the highest negative impact. To mitigate this, we examine Fused-Fine Tuning by training the model with a combination of fluent and disfluent data, resulting in improved performance on both public and real-life datasets. Our work highlights the significance of incorporating disfluency in training summarization models and its advantages in an industrial setting.
Anthology ID:
2023.emnlp-industry.52
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2023
Address:
Singapore
Editors:
Mingxuan Wang, Imed Zitouni
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
541–551
Language:
URL:
https://aclanthology.org/2023.emnlp-industry.52
DOI:
10.18653/v1/2023.emnlp-industry.52
Bibkey:
Cite (ACL):
Varun Nathan, Ayush Kumar, and Jithendra Vepa. 2023. Investigating the Role and Impact of Disfluency on Summarization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 541–551, Singapore. Association for Computational Linguistics.
Cite (Informal):
Investigating the Role and Impact of Disfluency on Summarization (Nathan et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-industry.52.pdf
Video:
 https://aclanthology.org/2023.emnlp-industry.52.mp4