Measuring the Instability of Fine-Tuning

Yupei Du, Dong Nguyen


Abstract
Fine-tuning pre-trained language models on downstream tasks with varying random seeds has been shown to be unstable, especially on small datasets. Many previous studies have investigated this instability and proposed methods to mitigate it. However, most of these studies only used the standard deviation of performance scores (SD) as their measure, which is a narrow characterization of instability. In this paper, we analyze SD and six other measures quantifying instability of different granularity levels. Moreover, we propose a systematic evaluation framework of these measures’ validity. Finally, we analyze the consistency and difference between different measures by reassessing existing instability mitigation methods. We hope our results will inform better measurements of the fine-tuning instability.
Anthology ID:
2023.acl-long.342
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6209–6230
Language:
URL:
https://aclanthology.org/2023.acl-long.342
DOI:
10.18653/v1/2023.acl-long.342
Bibkey:
Cite (ACL):
Yupei Du and Dong Nguyen. 2023. Measuring the Instability of Fine-Tuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6209–6230, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Measuring the Instability of Fine-Tuning (Du & Nguyen, ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.342.pdf
Video:
 https://aclanthology.org/2023.acl-long.342.mp4