An Empirical Study on the Transferability of Transformer Modules in Parameter-efficient Fine-tuning
Mohammad AkbarTajari | Sara Rajaee | Mohammad Taher Pilehvar
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Parameter-efficient fine-tuning has garnered lots of attention in recent studies. On this subject, we investigate the capability of different transformer modules in transferring knowledge from a pre-trained model to a downstream task. Our empirical results suggest that every transformer module is a winning ticket such that fine-tuning the specific module while the rest of the network is frozen achieves a comparable performance to the full fine-tuning case. Among different modules in LMs, LayerNorms exhibit a significant capacity for transfer learning to the extent that with only 0.003% updateable parameters in the layer-wise analysis, they can show acceptable performance on various target tasks. We argue that the performance of LayerNorms could be attributed to their high-magnitude weights compared to other components in a pre-trained model.