Training Dynamics for Text Summarization Models

Tanya Goyal; Jiacheng Xu; Junyi Jessy Li; Greg Durrett

doi:10.18653/v1/2022.findings-acl.163

Training Dynamics for Text Summarization Models

Tanya Goyal, Jiacheng Xu, Junyi Jessy Li, Greg Durrett

Abstract

Pre-trained language models (e.g. BART) have shown impressive results when fine-tuned on large summarization datasets. However, little is understood about this fine-tuning process, including what knowledge is retained from pre-training time or how content selection and generation strategies are learnt across iterations. In this work, we analyze the training dynamics for generation models, focusing on summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process. We find that a propensity to copy the input is learned early in the training process consistently across all datasets studied. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, though this behavior is more varied across domains. Based on these observations, we explore complementary approaches for modifying training: first, disregarding high-loss tokens that are challenging to learn and second, disregarding low-loss tokens that are learnt very quickly in the latter stages of the training process. We show that these simple training modifications allow us to configure our model to achieve different goals, such as improving factuality or improving abstractiveness.

Anthology ID:: 2022.findings-acl.163
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2061–2073
Language:
URL:: https://aclanthology.org/2022.findings-acl.163
DOI:: 10.18653/v1/2022.findings-acl.163
Bibkey:
Cite (ACL):: Tanya Goyal, Jiacheng Xu, Junyi Jessy Li, and Greg Durrett. 2022. Training Dynamics for Text Summarization Models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2061–2073, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Training Dynamics for Text Summarization Models (Goyal et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-acl.163.pdf
Video:: https://aclanthology.org/2022.findings-acl.163.mp4

PDF Cite Search Video