A Thorough Evaluation of Task-Specific Pretraining for Summarization

Sascha Rothe, Joshua Maynez, Shashi Narayan


Abstract
Task-agnostic pretraining objectives like masked language models or corrupted span prediction are applicable to a wide range of NLP downstream tasks (Raffel et al.,2019), but are outperformed by task-specific pretraining objectives like predicting extracted gap sentences on summarization (Zhang et al.,2020). We compare three summarization specific pretraining objectives with the task agnostic corrupted span prediction pretraining in controlled study. We also extend our study to a low resource and zero shot setup, to understand how many training examples are needed in order to ablate the task-specific pretraining without quality loss. Our results show that task-agnostic pretraining is sufficient for most cases which hopefully reduces the need for costly task-specific pretraining. We also report new state-of-the-art number for two summarization task using a T5 model with 11 billion parameters and an optimal beam search length penalty.
Anthology ID:
2021.emnlp-main.12
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
140–145
Language:
URL:
https://aclanthology.org/2021.emnlp-main.12
DOI:
10.18653/v1/2021.emnlp-main.12
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.12.pdf