DPDLLM: A Black-box Framework for Detecting Pre-training Data from Large Language Models

Baohang Zhou; Zezhong Wang; Lingzhi Wang; Hongru Wang; Ying Zhang; Kehui Song; Xuhui Sui; Kam-Fai Wong

doi:10.18653/v1/2024.findings-acl.35

DPDLLM: A Black-box Framework for Detecting Pre-training Data from Large Language Models

Baohang Zhou, Zezhong Wang, Lingzhi Wang, Hongru Wang, Ying Zhang, Kehui Song, Xuhui Sui, Kam-Fai Wong

Abstract

The success of large language models (LLM) benefits from large-scale model parameters and large amounts of pre-training data. However, the textual data for training LLM can not be confirmed to be legal because they are crawled from different web sites. For example, there are copyrighted articles, personal reviews and information in the pre-training data for LLM which are illegal. To address the above issue and develop legal LLM, we propose to detect the pre-training data from LLM in a pure black-box way because the existing LLM services only return the generated text. The previous most related works are the membership inference attack (MIA) on machine learning models to detect the training data from them. But the existing methods are based on analyzing the output probabilities of models which are unrealistic to LLM services. To tackle the problem, we firstly construct the benchmark datasets by collecting textual data from different domains as the seen and unseen pre-training data for LLMs. Then, we investigate a black-box framework named DPDLLM, with the only access to the generated texts from LLM for detecting textual data whether was used to train it. In the proposed framework, we exploit GPT-2 as the reference model to fit the textual data and feed the generated text from LLM into it to acquire sequence probabilities as the significant feature for detection. The experimental results on the benchmark datasets demonstrate that DPDLLM is effective on different popular LLMs and outperforms the existing methods.

Anthology ID:: 2024.findings-acl.35
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 644–653
Language:
URL:: https://aclanthology.org/2024.findings-acl.35/
DOI:: 10.18653/v1/2024.findings-acl.35
Bibkey:
Cite (ACL):: Baohang Zhou, Zezhong Wang, Lingzhi Wang, Hongru Wang, Ying Zhang, Kehui Song, Xuhui Sui, and Kam-Fai Wong. 2024. DPDLLM: A Black-box Framework for Detecting Pre-training Data from Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2024, pages 644–653, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: DPDLLM: A Black-box Framework for Detecting Pre-training Data from Large Language Models (Zhou et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.35.pdf

PDF Cite Search Fix data