How Well Can a Long Sequence Model Model Long Sequences? Comparing Architectural Inductive Biases on Long-Context Abilities

Jerry Huang

How Well Can a Long Sequence Model Model Long Sequences? Comparing Architectural Inductive Biases on Long-Context Abilities

Abstract

Long sequences occur in abundance within real-world scenarios, hence properly modelling them opens numerous down-stream use-cases. Deep neural networks, however, have often struggled with these for a variety of reasons. Recent advances, both in system engineering as well as model design, have enabled the scaling up of model that are purported to support extended context length. In particular, the state-space and linear recurrent neural network families of models hypothetically can entend to infinite sequence length. However, is this too good to be true? We conduct an evaluation to show that while such claims may be sound theoretically, there remain large practical gaps that are empirically observed. In particular, recurrent models still suffer in the same settings as long-context LLMs with attention. We further show that different inductive biases have inconsistent extrapolation capabilities, highlighting the need to further study such paradigms and investigate why long-context models seemingly fail to behave as one might expect.

Anthology ID:: 2025.coling-main.3
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29–39
Language:
URL:: https://aclanthology.org/2025.coling-main.3/
DOI:
Bibkey:
Cite (ACL):: Jerry Huang. 2025. How Well Can a Long Sequence Model Model Long Sequences? Comparing Architectural Inductive Biases on Long-Context Abilities. In Proceedings of the 31st International Conference on Computational Linguistics, pages 29–39, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: How Well Can a Long Sequence Model Model Long Sequences? Comparing Architectural Inductive Biases on Long-Context Abilities (Huang, COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.3.pdf

PDF Cite Search Fix data