Controlled Data Augmentation for Training Task-Oriented Dialog Systems with Low Resource Data

Sebastian Steindl, Ulrich Schäfer, Bernd Ludwig


Abstract
Modern dialog systems rely on Deep Learning to train transformer-based model architectures. These notoriously rely on large amounts of training data. However, the collection of conversational data is often a tedious and costly process. This is especially true for Task-Oriented Dialogs, where the system ought to help the user achieve specific tasks, such as making reservations. We investigate a controlled strategy for dialog synthesis. Our method generates utterances based on dialog annotations in a sequence-to-sequence manner. Besides exploring the viability of the approach itself, we also explore the effect of constrained beam search on the generation capabilities. Moreover, we analyze the effectiveness of the proposed method as a data augmentation by studying the impact the synthetic dialogs have on training dialog systems. We perform the experiments in multiple settings, simulating various amounts of ground-truth data. Our work shows that a controlled generation approach is a viable method to synthesize Task-Oriented Dialogs, that can in turn be used to train dialog systems. We were able to improve this process by utilizing constrained beam search.
Anthology ID:
2023.pandl-1.9
Volume:
Proceedings of the 2nd Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning
Month:
December
Year:
2023
Address:
Singapore
Editors:
Mihai Surdeanu, Ellen Riloff, Laura Chiticariu, Dayne Frietag, Gus Hahn-Powell, Clayton T. Morrison, Enrique Noriega-Atala, Rebecca Sharp, Marco Valenzuela-Escarcega
Venues:
PANDL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
92–102
Language:
URL:
https://aclanthology.org/2023.pandl-1.9
DOI:
10.18653/v1/2023.pandl-1.9
Bibkey:
Cite (ACL):
Sebastian Steindl, Ulrich Schäfer, and Bernd Ludwig. 2023. Controlled Data Augmentation for Training Task-Oriented Dialog Systems with Low Resource Data. In Proceedings of the 2nd Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning, pages 92–102, Singapore. Association for Computational Linguistics.
Cite (Informal):
Controlled Data Augmentation for Training Task-Oriented Dialog Systems with Low Resource Data (Steindl et al., PANDL-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.pandl-1.9.pdf
Video:
 https://aclanthology.org/2023.pandl-1.9.mp4