Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Cheng-Yu Hsieh; Yung-Sung Chuang; Chun-Liang Li; Zifeng Wang; Long Le; Abhishek Kumar; James Glass; Alexander Ratner; Chen-Yu Lee; Ranjay Krishna; Tomas Pfister

Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

Abstract

Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs’ intrinsic attention bias: LLMs exhibit an U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 10 percentage point. These findings open up future directions in understanding LLM attention bias and its potential consequences.

Anthology ID:: 2024.findings-acl.890
Volume:: Findings of the Association for Computational Linguistics ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand and virtual meeting
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14982–14995
Language:
URL:: https://aclanthology.org/2024.findings-acl.890
DOI:
Bibkey:
Cite (ACL):: Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, and Tomas Pfister. 2024. Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization. In Findings of the Association for Computational Linguistics ACL 2024, pages 14982–14995, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization (Hsieh et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.890.pdf

PDF Cite Search