Task-specific Compression for Multi-task Language Models using Attribution-based Pruning
Nakyeong Yang | Yunah Jang | Hwanhee Lee | Seohyeong Jeong | Kyomin Jung
Findings of the Association for Computational Linguistics: EACL 2023
Multi-task language models show outstanding performance for various natural language understanding tasks with only a single model. However, these language models inevitably utilize an unnecessarily large number of model parameters, even when used only for a specific task. In this paper, we propose a novel training-free compression method for multi-task language models using pruning method. Specifically, we use an attribution method to determine which neurons are essential for performing a specific task. We task-specifically prune unimportant neurons and leave only task-specific parameters. Furthermore, we extend our method to be applicable in both low-resource and unsupervised settings. Since our compression method is training-free, it uses little computing resources and does not update the pre-trained parameters of language models, reducing storage space usage. Experimental results on the six widely-used datasets show that our proposed pruning method significantly outperforms baseline pruning methods. In addition, we demonstrate that our method preserves performance even in an unseen domain setting.