Sumyeong Ahn
2023
NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models
Jongwoo Ko
|
Seungjoon Park
|
Yujin Kim
|
Sumyeong Ahn
|
Du-Seong Chang
|
Euijai Ahn
|
Se-Young Yun
Findings of the Association for Computational Linguistics: EMNLP 2023
Structured pruning methods have proven effective in reducing the model size and accelerating inference speed in various network architectures such as Transformers. Despite the versatility of encoder-decoder models in numerous NLP tasks, the structured pruning methods on such models are relatively less explored compared to encoder-only models. In this study, we investigate the behavior of the structured pruning of the encoder-decoder models in the decoupled pruning perspective of the encoder and decoder component, respectively. Our findings highlight two insights: (1) the number of decoder layers is the dominant factor of inference speed, and (2) low sparsity in the pruned encoder network enhances generation quality. Motivated by these findings, we propose a simple and effective framework, NASH, that narrows the encoder and shortens the decoder networks of encoder-decoder models. Extensive experiments on diverse generation and inference tasks validate the effectiveness of our method in both speedup and output quality.
Search
Co-authors
- Jongwoo Ko 1
- Seungjoon Park 1
- Yujin Kim 1
- Du-Seong Chang 1
- Euijai Ahn 1
- show all...