SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers

Viktoriia A. Chekalina; Anna Rudenko; Gleb Mezentsev; Aleksandr Mikhalev; Alexander Panchenko; Ivan Oseledets

doi:10.18653/v1/2024.emnlp-main.831

SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers

Viktoriia A. Chekalina, Anna Rudenko, Gleb Mezentsev, Aleksandr Mikhalev, Alexander Panchenko, Ivan Oseledets

Abstract

The performance of Transformer models has been enhanced by increasing the number of parameters and the length of the processed text. Consequently, fine-tuning the entire model becomes a memory-intensive process. High-performance methods for parameter-efficient fine-tuning (PEFT) typically work with Attention blocks and often overlook MLP blocks, which contain about half of the model parameters. We propose a new selective PEFT method, namely SparseGrad, that performs well on MLP blocks. We transfer layer gradients to a space where only about 1% of the layer’s elements remain significant. By converting gradients into a sparse structure, we reduce the number of updated parameters. We apply SparseGrad to fine-tune BERT and RoBERTa for the NLU task and LLaMa-2 for the Question-Answering task. In these experiments, with identical memory requirements, our method outperforms LoRA and MeProp, robust popular state-of-the-art PEFT approaches.

Anthology ID:: 2024.emnlp-main.831
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14929–14939
Language:
URL:: https://aclanthology.org/2024.emnlp-main.831/
DOI:: 10.18653/v1/2024.emnlp-main.831
Bibkey:
Cite (ACL):: Viktoriia A. Chekalina, Anna Rudenko, Gleb Mezentsev, Aleksandr Mikhalev, Alexander Panchenko, and Ivan Oseledets. 2024. SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 14929–14939, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers (Chekalina et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.831.pdf

PDF Cite Search Fix data