Context Length Extension via Generalized Extrapolation Scale

Linhan Li, Zhang Huaping


Abstract
Context length expansion of transformer models is considered a key challenge, especially when handling context beyond the training length during inference stage. In this paper, we propose Geeneralized extrapolatioN scalE (GeNE), a set of parameterized extrapolation functions applied to each layer and attention head to adaptively adjust its extrapolation scales. Experimental results show that GeNE provides a significant improvement on long context language modeling. By randomly scaling the extrapolation ratio during the finetuning, GeNE achieves stable extrapolation on 64k contexts by training on 16k length text. Further, the instruction following Llama2 model based on GeNE achieved competitive results compared with other open-source models of the same parameter scale.
Anthology ID:
2024.findings-acl.249
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4211–4218
Language:
URL:
https://aclanthology.org/2024.findings-acl.249
DOI:
Bibkey:
Cite (ACL):
Linhan Li and Zhang Huaping. 2024. Context Length Extension via Generalized Extrapolation Scale. In Findings of the Association for Computational Linguistics ACL 2024, pages 4211–4218, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Context Length Extension via Generalized Extrapolation Scale (Li & Huaping, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.249.pdf