Disfluency Generation for More Robust Dialogue Systems

Benjamin Marie


Abstract
Disfluencies in user utterances can trigger a chain of errors impacting all the modules of a dialogue system: natural language understanding, dialogue state tracking, and response generation. In this work, we first analyze existing dialogue datasets commonly used in research and show that they only contain a marginal number of disfluent utterances. Due to this relative absence of disfluencies in their training data, dialogue systems may then critically fail when exposed to disfluent utterances. Following this observation, we propose to augment existing datasets with disfluent user utterances by paraphrasing fluent utterances into disfluent ones. Relying on a pre-trained language model, our few-shot disfluent paraphraser guided by a disfluency classifier can generate useful disfluent utterances for training better dialogue systems. We report on improvements for both dialogue state tracking and response generation when the dialogue systems are trained on datasets augmented with our disfluent utterances.
Anthology ID:
2023.findings-acl.728
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11479–11488
Language:
URL:
https://aclanthology.org/2023.findings-acl.728
DOI:
10.18653/v1/2023.findings-acl.728
Bibkey:
Cite (ACL):
Benjamin Marie. 2023. Disfluency Generation for More Robust Dialogue Systems. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11479–11488, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Disfluency Generation for More Robust Dialogue Systems (Marie, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.728.pdf
Video:
 https://aclanthology.org/2023.findings-acl.728.mp4