Linhan Li


2024

pdf bib
Context Length Extension via Generalized Extrapolation Scale
Linhan Li | Zhang Huaping
Findings of the Association for Computational Linguistics ACL 2024

Context length expansion of transformer models is considered a key challenge, especially when handling context beyond the training length during inference stage. In this paper, we propose Geeneralized extrapolatioN scalE (GeNE), a set of parameterized extrapolation functions applied to each layer and attention head to adaptively adjust its extrapolation scales. Experimental results show that GeNE provides a significant improvement on long context language modeling. By randomly scaling the extrapolation ratio during the finetuning, GeNE achieves stable extrapolation on 64k contexts by training on 16k length text. Further, the instruction following Llama2 model based on GeNE achieved competitive results compared with other open-source models of the same parameter scale.
Search
Co-authors
Venues