Bi-level Finetuning with Task-dependent Similarity Structure for Low-resource Training

Sai Ashish Somayajula, Lifeng Jin, Linfeng Song, Haitao Mi, Dong Yu


Abstract
Training a large language model in low-resource settings is challenging since they are susceptible to overfitting with limited generalization abilities. Previous work addresses this issue by approaches such as tunable parameters reduction or data augmentation. However, they either limit the trained models’ expressiveness or rely on task-independent knowledge. In this paper, we propose the Bi-level Finetuning with Task-dependent Similarity Structure framework where all parameters, including the embeddings for unseen tokens, are finetuned with task-dependent information from the training data only. In this framework, a task-dependent similarity structure is learned in a data-driven fashion, which in turn is used to compose soft embeddings from conventional embeddings to be used in training to update all parameters. In order to learn the similarity structure and model parameters, we propose a bi-level optimization algorithm with two stages—search and finetune—to ensure successful learning. Results of experiments on several classification datasets in low-resource scenarios demonstrate that models trained with our method outperform strong baselines. Ablation experiments further support the effectiveness of different components in our framework. Code is available at https://github.com/Sai-Ashish/BFTSS.
Anthology ID:
2023.findings-acl.544
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8569–8588
Language:
URL:
https://aclanthology.org/2023.findings-acl.544
DOI:
10.18653/v1/2023.findings-acl.544
Bibkey:
Cite (ACL):
Sai Ashish Somayajula, Lifeng Jin, Linfeng Song, Haitao Mi, and Dong Yu. 2023. Bi-level Finetuning with Task-dependent Similarity Structure for Low-resource Training. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8569–8588, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Bi-level Finetuning with Task-dependent Similarity Structure for Low-resource Training (Somayajula et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.544.pdf