Cut the Deadwood Out: Backdoor Purification via Guided Module Substitution

Yao Tong; Weijun Li; Xuanli He; Haolan Zhan; Qiongkai Xu

doi:10.18653/v1/2025.findings-emnlp.1293

Cut the Deadwood Out: Backdoor Purification via Guided Module Substitution

Yao Tong, Weijun Li, Xuanli He, Haolan Zhan, Qiongkai Xu

Abstract

Model NLP models are commonly trained (or fine-tuned) on datasets from untrusted platforms like HuggingFace, posing significant risks of data poisoning attacks. A practical yet underexplored challenge arises when such backdoors are discovered after model deployment, making retraining-required defenses less desirable due to computational costs and data constraints. In this work, we propose Guided Module Substitution (GMS), an effective retraining-free method based on guided merging of the victim model with a single proxy model. Specifically, GMS selectively replaces modules in the victim model based on a trade-off signal between utility and backdoor. GMS offers four desirable properties: (1) robustness to the choice and trustworthiness of the proxy model, (2) applicability under relaxed data assumptions, (3) stability across hyperparameters, and (4) transferability across different attacks. Extensive experiments on encoder models and decoder LLMs demonstrate the strong effectiveness of GMS. GMS significantly outperforms even the strongest defense baseline, particularly against challenging attacks like LWS.

Anthology ID:: 2025.findings-emnlp.1293
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23760–23783
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.1293/
DOI:: 10.18653/v1/2025.findings-emnlp.1293
Bibkey:
Cite (ACL):: Yao Tong, Weijun Li, Xuanli He, Haolan Zhan, and Qiongkai Xu. 2025. Cut the Deadwood Out: Backdoor Purification via Guided Module Substitution. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 23760–23783, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Cut the Deadwood Out: Backdoor Purification via Guided Module Substitution (Tong et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.1293.pdf
Checklist:: 2025.findings-emnlp.1293.checklist.pdf

PDF Cite Search Checklist Fix data