Uncovering Hidden Consequences of Pre-training Objectives in Sequence-to-Sequence Models

Tannon Kew, Rico Sennrich


Abstract
Some variants of self-supervised denoising objectives for pre-training encoder-decoder language models have been reported to have a negligible impact on downstream performance. Yet the design of these pre-training objectives leads to behavioural differences that can be uncovered with specific manipulations. We reproduce a recently proposed zero-shot control method and find that it is only successful on a subset of models. To understand what causes the difference in its effectiveness, we perform a set of controlled experiments, varying only the pre-training objective, and find unexpected interactions between the pre-training method and downstream controllability of models after fine-tuning. Our results show that different pre-training objectives have consequences that may not be visible in standard downstream evaluation, but which should be taken into account when developing models with controllability in mind.
Anthology ID:
2023.findings-acl.438
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7010–7022
Language:
URL:
https://aclanthology.org/2023.findings-acl.438
DOI:
10.18653/v1/2023.findings-acl.438
Bibkey:
Cite (ACL):
Tannon Kew and Rico Sennrich. 2023. Uncovering Hidden Consequences of Pre-training Objectives in Sequence-to-Sequence Models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7010–7022, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Uncovering Hidden Consequences of Pre-training Objectives in Sequence-to-Sequence Models (Kew & Sennrich, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.438.pdf