Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining

Yicheng Zou, Bolin Zhu, Xingwu Hu, Tao Gui, Qi Zhang


Abstract
With the rapid increase in the volume of dialogue data from daily life, there is a growing demand for dialogue summarization. Unfortunately, training a large summarization model is generally infeasible due to the inadequacy of dialogue data with annotated summaries. Most existing works for low-resource dialogue summarization directly pretrain models in other domains, e.g., the news domain, but they generally neglect the huge difference between dialogues and conventional articles. To bridge the gap between out-of-domain pretraining and in-domain fine-tuning, in this work, we propose a multi-source pretraining paradigm to better leverage the external summary data. Specifically, we exploit large-scale in-domain non-summary data to separately pretrain the dialogue encoder and the summary decoder. The combined encoder-decoder model is then pretrained on the out-of-domain summary data using adversarial critics, aiming to facilitate domain-agnostic summarization. The experimental results on two public datasets show that with only limited training data, our approach achieves competitive performance and generalizes well in different dialogue scenarios.
Anthology ID:
2021.emnlp-main.7
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
80–91
Language:
URL:
https://aclanthology.org/2021.emnlp-main.7
DOI:
10.18653/v1/2021.emnlp-main.7
Bibkey:
Cite (ACL):
Yicheng Zou, Bolin Zhu, Xingwu Hu, Tao Gui, and Qi Zhang. 2021. Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 80–91, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining (Zou et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.7.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.7.mp4
Code
 rowitzou/dams
Data
BookCorpusMS COCOReddit Conversation CorpusSAMSum