SparseFlow: Accelerating Transformers by Sparsifying Information Flows

Yeachan Kim, SangKeun Lee


Abstract
Transformers have become the de-facto standard for natural language processing. However, dense information flows within transformers pose significant challenges for real-time and resource-constrained devices, as computational complexity grows quadratically with sequence length. To counteract such dense information flows, we propose SparseFlow, a novel efficient method designed to sparsify the dense pathways of token representations across all transformer blocks. To this end, SparseFlow parameterizes the information flows linking token representations to transformer blocks. These parameterized information flows are optimized to be sparse, allowing only the salient information to pass through into the blocks. To validate the efficacy of SparseFlow, we conduct comprehensive experiments across diverse benchmarks (understanding and generation), scales (ranging from millions to billions), architectures (including encoders, decoders, and seq-to-seq models), and modalities (such as language-only and vision-language). The results convincingly demonstrate that sparsifying the dense information flows leads to substantial speedup gains without compromising task accuracy. For instance, SparseFlow reduces computational costs by half on average, without a significant loss in accuracy.
Anthology ID:
2024.acl-long.323
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5937–5948
Language:
URL:
https://aclanthology.org/2024.acl-long.323
DOI:
Bibkey:
Cite (ACL):
Yeachan Kim and SangKeun Lee. 2024. SparseFlow: Accelerating Transformers by Sparsifying Information Flows. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5937–5948, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
SparseFlow: Accelerating Transformers by Sparsifying Information Flows (Kim & Lee, ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.323.pdf