Zhenhua Xu

2025

pdf bib abs
MEraser: An Effective Fingerprint Erasure Approach for Large Language Models
Jingxuan Zhang | Zhenhua Xu | Rui Hu | Wenpeng Xing | Xuhong Zhang | Meng Han
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) have become increasingly prevalent across various sectors, raising critical concerns about model ownership and intellectual property protection. Although backdoor-based fingerprinting has emerged as a promising solution for model authentication, effective attacks for removing these fingerprints remain largely unexplored. Therefore, We present Mismatched Eraser (MEraser), a novel method for effectively removing backdoor-based fingerprints from LLMs while maintaining model performance. Our approach leverages a two-phase fine-tuning strategy utilizing carefully constructed mismatched and clean datasets. Through extensive evaluation across multiple LLM architectures and fingerprinting methods, we demonstrate that MEraser achieves complete fingerprinting removal while maintaining model performance with minimal training data of fewer than 1,000 samples. Furthermore, we introduce a transferable erasure mechanism that enables effective fingerprinting removal across different models without repeated training. In conclusion, our approach provides a practical solution for fingerprinting removal in LLMs, reveals critical vulnerabilities in current fingerprinting techniques, and establishes comprehensive evaluation benchmarks for developing more resilient model protection methods in the future.

pdf bib abs
CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor
Zhenhua Xu | Xixiang Zhao | Xubin Yue | Shengwei Tian | Changting Lin | Meng Han
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The widespread deployment of large language models (LLMs) has intensified concerns around intellectual property (IP) protection, as model theft and unauthorized redistribution become increasingly feasible. To address this, model fingerprinting aims to embed verifiable ownership traces into LLMs. However, existing methods face inherent trade-offs between stealthness, robustness, and generalizability—being either detectable via distributional shifts, vulnerable to adversarial modifications, or easily invalidated once the fingerprint is revealed. In this work, we introduce CTCC, a novel rule-driven fingerprinting framework that encodes contextual correlations across multiple dialogue turns—such as counterfactual—rather than relying on token-level or single-turn triggers. CTCC enables fingerprint verification under black-box access while mitigating false positives and fingerprint leakage, supporting continuous construction under a shared semantic rule even if partial triggers are exposed. Extensive experiments across multiple LLM architectures demonstrate that CTCC consistently achieves stronger stealth and robustness than prior work. Our findings position CTCC as a reliable and practical solution for ownership verification in real-world LLM deployment scenarios.

pdf bib abs
EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint
Zhenhua Xu | Meng Han | Wenpeng Xing
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The proliferation of large language models (LLMs) has intensified concerns over model theft and license violations, necessitating robust and stealthy ownership verification. Existing fingerprinting methods either require impractical white-box access or introduce detectable statistical anomalies. We propose EverTracer, a novel gray-box fingerprinting framework that ensures stealthy and robust model provenance tracing. EverTracer is the first to repurpose Membership Inference Attacks (MIAs) for defensive use, embedding ownership signals via memorization instead of artificial trigger-output overfitting. It consists of Fingerprint Injection, which fine-tunes the model on any natural language data without detectable artifacts, and Verification, which leverages calibrated probability variation signal to distinguish fingerprinted models. This approach remains robust against adaptive adversaries, including input level modification, and model-level modifications. Extensive experiments across architectures demonstrate EverTracer’s state-of-the-art effectiveness, stealthness, and resilience, establishing it as a practical solution for securing LLM intellectual property.

pdf bib abs
PREE: Towards Harmless and Adaptive Fingerprint Editing in Large Language Models via Knowledge Prefix Enhancement
Xubin Yue | Zhenhua Xu | Wenpeng Xing | Jiahui Yu | Mohan Li | Meng Han
Findings of the Association for Computational Linguistics: EMNLP 2025

Addressing the intellectual property protection challenges in commercial deployment of large language models (LLMs), existing black-box fingerprinting techniques face dual challenges from incremental fine-tuning erasure and feature-space defense due to their reliance on overfitting high-perplexity trigger patterns. We firstly reveal that, model editing in the fingerprint domain exhibits unique advantages including significantly lower false positive rates, enhanced harmlessness, and superior robustness. Building on this foundation, this paper innovatively proposes a Prefix-enhanced Fingerprint Editing Framework (PREE), which encodes copyright information into parameter offsets through dual-channel knowledge edit to achieve covert embedding of fingerprint features. Experimental results demonstrate that the proposed solution achieves the 90% trigger precision in mainstream architectures including LLaMA-3 and Qwen-2.5. The minimal parameter offset (change rate < 0.03) effectively preserves original knowledge representation while demonstrating strong robustness against incremental fine-tuning and multi-dimensional defense strategies, maintaining zero false positive rate throughout evaluations.

With the rapid development of large language models (LLMs), protecting intellectual property (IP) has become increasingly crucial. To tackle high costs and potential contamination in fingerprint integration, we propose LoRA-FP, a lightweight plug-and-play framework that encodes backdoor fingerprints into LoRA adapters via constrained fine-tuning. This enables seamless fingerprint transplantation through parameter fusion, eliminating full-parameter updates while maintaining integrity. Experiments demonstrate that LoRA-FP achieves superior robustness against various scenarios like incremental training and model fusion, while significantly reducing computational overhead compared to traditional approaches.

Co-authors

Venues

Fix author