Peng Wang
Other people with similar names: Peng Wang (Chinese Academy of Sciences), Peng Wang (Fudan University), Peng Wang (Macau University, Central South University), Peng Wang (Southeast University Nanjing), Peng Wang (University of Virginia), Peng Wang (Zhejiang University)
Unverified author pages with similar names: Peng Wang
2026
PDR: A Plug-and-Play Positional Decay Framework for LLM Pre-training Data Detection
Jinhan Liu | Yibo Yang | Ruiying Lu | Piotr Piękos | Yimeng Chen | Peng Wang | Dandan Guo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jinhan Liu | Yibo Yang | Ruiying Lu | Piotr Piękos | Yimeng Chen | Peng Wang | Dandan Guo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Detecting pre-training data in Large Language Models (LLMs) is crucial for auditing data privacy and copyright compliance, yet it remains challenging in black-box, zero-shot settings where computational resources and training data are scarce. While existing likelihood-based methods have shown promise, they typically aggregate token-level scores using uniform weights, thereby neglecting the inherent information-theoretic dynamics of autoregressive generation. In this paper, we hypothesize and empirically validate that memorization signals are heavily skewed towards the high-entropy initial tokens, where model uncertainty is highest, and decay as context accumulates. To leverage this linguistic property, we introduce Positional Decay Reweighting (PDR), a training-free and plug-and-play framework. PDR explicitly reweights token-level scores to amplify distinct signals from early positions while suppressing noise from later ones. Extensive experiments show that PDR acts as a robust prior and can usually enhance a wide range of advanced methods across multiple benchmarks.