TreeRAG: Unleashing the Power of Hierarchical Storage for Enhanced Knowledge Retrieval in Long Documents

Wenyu Tao; Xiaofen Xing; Yirong Chen; Linyi Huang; Xiangmin Xu

doi:10.18653/v1/2025.findings-acl.20

TreeRAG: Unleashing the Power of Hierarchical Storage for Enhanced Knowledge Retrieval in Long Documents

Wenyu Tao, Xiaofen Xing, Yirong Chen, Linyi Huang, Xiangmin Xu

Abstract

When confronting long document information retrieval for Query-Focused Summarization(QFS), Traditional Retrieval-Augmented Generation(RAG) frameworks struggle to retrieve all relevant knowledge points, and the chunking and retrieve strategies of existing frameworks may disrupt the connections between knowledge points and the integrity of the information. To address these issues, we propose TreeRAG, which employs Tree-Chunking for chunking and embedding in a tree-like structure , coupled with "root-to-leaves" and "leaf-to-root" retrieve strategy named Bidirectional Traversal Retrieval. This approach effectively preserves the hierarchical structure among knowledge points and significantly enhances the ability to retrieve while minimizing noise inference. Our experimental results on the Finance, Law, and Medical subsets of the Dragonball dataset demonstrate that TreeRAG achieves significant enhancements in both recall quality and precision compared to traditional and popular existing methods and achieves better performance to corresponding question-answering tasks, marking a new breakthrough in long document knowledge retrieval.

Anthology ID:: 2025.findings-acl.20
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 356–371
Language:
URL:: https://aclanthology.org/2025.findings-acl.20/
DOI:: 10.18653/v1/2025.findings-acl.20
Bibkey:
Cite (ACL):: Wenyu Tao, Xiaofen Xing, Yirong Chen, Linyi Huang, and Xiangmin Xu. 2025. TreeRAG: Unleashing the Power of Hierarchical Storage for Enhanced Knowledge Retrieval in Long Documents. In Findings of the Association for Computational Linguistics: ACL 2025, pages 356–371, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: TreeRAG: Unleashing the Power of Hierarchical Storage for Enhanced Knowledge Retrieval in Long Documents (Tao et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.20.pdf

PDF Cite Search Fix data