LongT5: Efficient Text-To-Text Transformer for Long Sequences

Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang


Abstract
Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present LongT5, a new model that explores the effects of scaling both the input length and model size at the same time. Specifically, we integrate attention ideas from long-input transformers (ETC), and adopt pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call Transient Global (TGlobal), which mimics ETC’s local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization and question answering tasks, as well as outperform the original T5 models on these tasks. We have open sourced our architecture and training code, as well as our pre-trained model checkpoints.
Anthology ID:
2022.findings-naacl.55
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
724–736
Language:
URL:
https://aclanthology.org/2022.findings-naacl.55
DOI:
10.18653/v1/2022.findings-naacl.55
Bibkey:
Cite (ACL):
Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, and Yinfei Yang. 2022. LongT5: Efficient Text-To-Text Transformer for Long Sequences. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 724–736, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
LongT5: Efficient Text-To-Text Transformer for Long Sequences (Guo et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-naacl.55.pdf
Video:
 https://aclanthology.org/2022.findings-naacl.55.mp4
Code
 google-research/longt5
Data
BigPatentCNN/Daily MailMulti-NewsPubmedSCROLLSTriviaQAarXivarXiv Summarization Dataset