FastSeq: Make Sequence Generation Faster

Yu Yan; Fei Hu; Jiusheng Chen; Nikhil Bhendawade; Ting Ye; Yeyun Gong; Nan Duan; Desheng Cui; Bingyu Chi; Ruofei Zhang

doi:10.18653/v1/2021.acl-demo.26

FastSeq: Make Sequence Generation Faster

Yu Yan, Fei Hu, Jiusheng Chen, Nikhil Bhendawade, Ting Ye, Yeyun Gong, Nan Duan, Desheng Cui, Bingyu Chi, Ruofei Zhang

Abstract

Transformer-based models have made tremendous impacts in natural language generation. However the inference speed is a bottleneck due to large model size and intensive computing involved in auto-regressive decoding process. We develop FastSeq framework to accelerate sequence generation without accuracy loss. The proposed optimization techniques include an attention cache optimization, an efficient algorithm for detecting repeated n-grams, and an asynchronous generation pipeline with parallel I/O. These optimizations are general enough to be applicable to Transformer-based models (e.g., T5, GPT2, and UniLM). Our benchmark results on a set of widely used and diverse models demonstrate 4-9x inference speed gain. Additionally, FastSeq is easy to use with a simple one-line code change. The source code is available at https://github.com/microsoft/fastseq.

Anthology ID:: 2021.acl-demo.26
Volume:: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations
Month:: August
Year:: 2021
Address:: Online
Editors:: Heng Ji, Jong C. Park, Rui Xia
Venues:: ACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 218–226
Language:
URL:: https://aclanthology.org/2021.acl-demo.26/
DOI:: 10.18653/v1/2021.acl-demo.26
Bibkey:
Cite (ACL):: Yu Yan, Fei Hu, Jiusheng Chen, Nikhil Bhendawade, Ting Ye, Yeyun Gong, Nan Duan, Desheng Cui, Bingyu Chi, and Ruofei Zhang. 2021. FastSeq: Make Sequence Generation Faster. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pages 218–226, Online. Association for Computational Linguistics.
Cite (Informal):: FastSeq: Make Sequence Generation Faster (Yan et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.acl-demo.26.pdf
Code: microsoft/fastseq
Data: CNN/Daily Mail, WMT 2016

PDF Cite Search Code Fix data