Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling

Zile Qiao; Wei Ye; Yong Jiang; Tong Mo; Pengjun Xie; Weiping Li; Fei Huang; Shikun Zhang

doi:10.18653/v1/2025.findings-naacl.148

Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling

Zile Qiao, Wei Ye, Yong Jiang, Tong Mo, Pengjun Xie, Weiping Li, Fei Huang, Shikun Zhang

Abstract

Retrieval-augmented language models (RALMs) have recently shown great potential in mitigating the limitations of implicit knowledge in LLMs, such as untimely updating of the latest expertise and unreliable retention of long-tail knowledge. However, since the external knowledge base, as well as the retriever, can not guarantee reliability, potentially leading to the knowledge retrieved not being helpful or even misleading for LLM generation. In this paper, we introduce Supportiveness-based Knowledge Rewriting (SKR), a robust and pluggable knowledge rewriter inherently optimized for LLM generation. Specifically, we introduce the novel concept of “supportiveness”—which represents how effectively a knowledge piece facilitates downstream tasks. Based on supportiveness, we first design a training data curation strategy for our rewriter model, effectively identifying and filtering out poor or irrelevant rewrites to improve data efficacy. We then introduce the direct preference optimization (DPO) algorithm to align the generated rewrites to optimal supportiveness, guiding the rewriter model to summarize augmented content that better improves the final response. Comprehensive evaluations across six popular knowledge-intensive tasks and four LLMs have demonstrated the effectiveness and superiority of SKR. With only 7B parameters, SKR has shown better knowledge rewriting capability over GPT-4.

Anthology ID:: 2025.findings-naacl.148
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2728–2740
Language:
URL:: https://aclanthology.org/2025.findings-naacl.148/
DOI:: 10.18653/v1/2025.findings-naacl.148
Bibkey:
Cite (ACL):: Zile Qiao, Wei Ye, Yong Jiang, Tong Mo, Pengjun Xie, Weiping Li, Fei Huang, and Shikun Zhang. 2025. Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 2728–2740, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling (Qiao et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.148.pdf

PDF Cite Search Fix data