Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts

Sai Ashish Somayajula, Youwei Liang, Li Zhang, Abhishek Singh, Pengtao Xie


Abstract
Pretrained Language Models (PLMs) have advanced Natural Language Processing (NLP) tasks significantly, but finetuning PLMs on low-resource datasets poses significant challenges such as instability and overfitting. Previous methods tackle these issues by finetuning a strategically chosen subnetwork on a downstream task, while keeping the remaining weights fixed to the pretrained weights. However, they rely on a suboptimal criteria for sub-network selection, leading to suboptimal solutions. To address these limitations, we propose a regularization method based on attention-guided weight mixup for finetuning PLMs. Our approach represents each network weight as a mixup of task-specific weight and pretrained weight, controlled by a learnable attention parameter, providing finer control over sub-network selection. Furthermore, we employ a bi-level optimization (BLO) based framework on two separate splits of the training dataset, improving generalization and combating overfitting. We validate the efficacy of our proposed method through extensive experiments, demonstrating its superiority over previous methods, particularly in the context of finetuning PLMs on low-resource datasets. Our code is available at https://github.com/Sai-Ashish/Attention_guided_weight_mixup_BLO.
Anthology ID:
2024.naacl-long.277
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4936–4953
Language:
URL:
https://aclanthology.org/2024.naacl-long.277
DOI:
Bibkey:
Cite (ACL):
Sai Ashish Somayajula, Youwei Liang, Li Zhang, Abhishek Singh, and Pengtao Xie. 2024. Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4936–4953, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts (Somayajula et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.277.pdf
Copyright:
 2024.naacl-long.277.copyright.pdf