Can’t Remember Details in Long Documents? You Need Some R&R

Devanshu Agrawal, Shang Gao, Martin Gajek


Abstract
Long-context large language models (LLMs) hold promise for tasks such as question-answering (QA) over long documents, but they tend to miss important information in the middle of context documents [(Liu 2023)](https://arxiv.org/abs/2307.03172). Here, we introduce *R&R*—a combination of two novel prompt-based methods called *reprompting* and *in-context retrieval* (ICR)—to alleviate this effect in document-based QA. In reprompting, we repeat the prompt instructions periodically throughout the context document to remind the LLM of its original task. In ICR, rather than instructing the LLM to answer the question directly, we instruct it to retrieve the top k passage numbers most relevant to the given question, which are then used as an abbreviated context in a second QA prompt. We test R&R with GPT-4 Turbo and Claude-2.1 on documents up to 80k tokens in length and observe a 16-point boost in QA accuracy on average. Our further analysis suggests that R&R improves performance on long document-based QA because it reduces the distance between relevant context and the instructions. Finally, we show that compared to short-context chunkwise methods, R&R enables the use of larger chunks that cost fewer LLM calls and output tokens, while minimizing the drop in accuracy.
Anthology ID:
2024.findings-emnlp.742
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12692–12704
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.742
DOI:
Bibkey:
Cite (ACL):
Devanshu Agrawal, Shang Gao, and Martin Gajek. 2024. Can’t Remember Details in Long Documents? You Need Some R&R. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 12692–12704, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Can’t Remember Details in Long Documents? You Need Some R&R (Agrawal et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.742.pdf
Software:
 2024.findings-emnlp.742.software.zip