Data Augmentation for Low-Resource Dialogue Summarization

Yongtai Liu, Joshua Maynez, Gonçalo Simões, Shashi Narayan


Abstract
We present DADS, a novel Data Augmentation technique for low-resource Dialogue Summarization. Our method generates synthetic examples by replacing sections of text from both the input dialogue and summary while preserving the augmented summary to correspond to a viable summary for the augmented dialogue. We utilize pretrained language models that produce highly likely dialogue alternatives while still being free to generate diverse alternatives. We applied our data augmentation method to the SAMSum dataset in low resource scenarios, mimicking real world problems such as chat, thread, and meeting summarization where large scale supervised datasets with human-written summaries are scarce. Through both automatic and human evaluations, we show that DADS shows strong improvements for low resource scenarios while generating topically diverse summaries without introducing additional hallucinations to the summaries.
Anthology ID:
2022.findings-naacl.53
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
703–710
Language:
URL:
https://aclanthology.org/2022.findings-naacl.53
DOI:
10.18653/v1/2022.findings-naacl.53
Bibkey:
Cite (ACL):
Yongtai Liu, Joshua Maynez, Gonçalo Simões, and Shashi Narayan. 2022. Data Augmentation for Low-Resource Dialogue Summarization. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 703–710, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Data Augmentation for Low-Resource Dialogue Summarization (Liu et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-naacl.53.pdf
Video:
 https://aclanthology.org/2022.findings-naacl.53.mp4