Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Mosh Levy; Alon Jacoby; Yoav Goldberg

doi:10.18653/v1/2024.acl-long.818

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Abstract

This paper explores the impact of extending input lengths on the capabilities of Large Language Models (LLMs). Despite LLMs advancements in recent times, their performance consistency across different input lengths is not well understood. We investigate this aspect by introducing a novel QA reasoning framework, specifically designed to assess the impact of input length. We isolate the effect of input length using multiple versions of the same sample, each being extended with padding of different lengths, types and locations. Our findings show a notable degradation in LLMs’ reasoning performance at much shorter input lengths than their technical maximum. We show that the degradation trend appears in every version of our dataset, although at different intensities.Additionally, our study reveals that the traditional metric of next word prediction correlates negatively with performance of LLMs’ on our reasoning dataset. We analyse our results and identify failure modes that can serve as useful guides for future research, potentially informing strategies to address the limitations observed in LLMs.

Anthology ID:: 2024.acl-long.818
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15339–15353
Language:
URL:: https://aclanthology.org/2024.acl-long.818/
DOI:: 10.18653/v1/2024.acl-long.818
Award:: Outstanding Paper Award
Bibkey:
Cite (ACL):: Mosh Levy, Alon Jacoby, and Yoav Goldberg. 2024. Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15339–15353, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models (Levy et al., ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-long.818.pdf

PDF Cite Search Fix data