Improving generalization in large langue model by learning prefix subspaces

Louis Falissard, Vincent Guigue, Laure Soulier


Abstract
This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as “few-shot learning setting”). We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces. This optimization method, recently introduced in computer vision, aims to improve model generalization by identifying wider local optima through the joint optimization of an entire simplex of models in parameter space. Although this property would be highly beneficial in the context of training large language models in the “few-shot learning” setting, its adaptation to massive, pretrained transformers poses some challenges. First, their considerable number of parameters make it difficult to train several model jointly, and second, their deterministic parameter initialisation schemes make them unfit to the subspace method as originaly proposed. We show in this paper that its application to “Parameter Efficient Fine-Tuning” (PEFT) methods, however, is relatively natural, and we propose to apply it to prefix-tuning, by learning entire simplexes of continous prefixes. We test our method on a variant of the GLUE benchmark adapted to the few-shot learning setting, and show that both our contributions (learning prefix simplexes, and non-deterministic validation metric inference) jointly lead to a gain in average performances compared to state of the art methods.
Anthology ID:
2023.findings-emnlp.768
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11474–11483
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.768
DOI:
10.18653/v1/2023.findings-emnlp.768
Bibkey:
Cite (ACL):
Louis Falissard, Vincent Guigue, and Laure Soulier. 2023. Improving generalization in large langue model by learning prefix subspaces. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11474–11483, Singapore. Association for Computational Linguistics.
Cite (Informal):
Improving generalization in large langue model by learning prefix subspaces (Falissard et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.768.pdf