Jinwen Chen
2026
Privacy-Preserving Reasoning with Knowledge-Distilled Parametric Retrieval Augmented Generation
Jinwen Chen | Hainan Zhang | Liang Pang | Yongxin Tong | Haibo Zhou | Wei Lin | Zhiming Zheng
Findings of the Association for Computational Linguistics: ACL 2026
Jinwen Chen | Hainan Zhang | Liang Pang | Yongxin Tong | Haibo Zhou | Wei Lin | Zhiming Zheng
Findings of the Association for Computational Linguistics: ACL 2026
The current RAG system requires uploading plaintext documents to the cloud, risking private data leakage. Parametric RAG (PRAG) encodes documents as LoRA parameters within LLMs, offering a possible way to reduce exposure of raw content. However, it still faces two issues: (1) PRAG demands synthesizing QA pairs and fine-tuning LLM for each individual document to create its corresponding LoRA, leading to unacceptable inference latency. (2) The performance of PRAG relies solely on synthetic QA data while lacking internal alignment with standard RAG, resulting in poor generalization on out-of-distribution (OOD) inputs. Therefore, achieving high-efficiency parameterization while maintaining RAG-level performance remains a critical challenge for privacy-preserving reasoning. In this paper, we propose DistilledPRAG, a generalizable knowledge-distilled parametric RAG model aligned with standard RAG in document structure and parameter activation. We first synthesize QA pairs from single and multi-documents to enhance cross-document reasoning. Then, we mask the plaintext documents with a special token and translate them to LoRA via a parameter generator, maintaining the standard RAG document structure. Finally, guided by synthetic QA data, we train the parameter generator to match standard RAG’s hidden states and output logits, enabling RAG-style reasoning without original documents. Experiments on four QA datasets show that DistilledPRAG outperforms baselines in accuracy and generalizes well on OOD data.
2025
Detecting Stealthy Backdoor Samples based on Intra-class Distance for Large Language Models
Jinwen Chen | Hainan Zhang | Fei Sun | Qinnan Zhang | Sijia Wen | Ziwei Wang | Zhiming Zheng
Findings of the Association for Computational Linguistics: EMNLP 2025
Jinwen Chen | Hainan Zhang | Fei Sun | Qinnan Zhang | Sijia Wen | Ziwei Wang | Zhiming Zheng
Findings of the Association for Computational Linguistics: EMNLP 2025
Stealthy data poisoning during fine-tuning can backdoor large language models (LLMs), threatening downstream safety. Existing detectors either use classifier-style probability signals—ill-suited to generation—or rely on rewriting, which can degrade quality and even introduce new triggers. We address the practical need to efficiently remove poisoned examples before or during fine-tuning. We observe a robust signal in the response space: after applying TF-IDF to model responses, poisoned examples form compact clusters (driven by consistent malicious outputs), while clean examples remain dispersed. We leverage this with RFTC—Reference-Filtration + TF-IDF Clustering. RFTC first compares each example’s response with that of a reference model and flags those with large deviations as suspicious; it then performs TF-IDF clustering on the suspicious set and identifies true poisoned examples using intra-class distance. On two machine translation datasets and one QA dataset, RFTC outperforms prior detectors in both detection accuracy and the downstream performance of the fine-tuned models. Ablations with different reference models further validate the effectiveness and robustness of Reference-Filtration.