Distillation-Resistant Watermarking for Model Protection in NLP

Xuandong Zhao, Lei Li, Yu-Xiang Wang


Abstract
How can we protect the intellectual property of trained NLP models? Modern NLP models are prone to stealing by querying and distilling from their publicly exposed APIs. However, existing protection methods such as watermarking only work for images but are not applicable to text. We propose Distillation-Resistant Watermarking (DRW), a novel technique to protect NLP models from being stolen via distillation. DRW protects a model by injecting watermarks into the victim’s prediction probability corresponding to a secret key and is able to detect such a key by probing a suspect model. We prove that a protected model still retains the original accuracy within a certain bound. We evaluate DRW on a diverse set of NLP tasks including text classification, part-of-speech tagging, and named entity recognition. Experiments show that DRW protects the original model and detects stealing suspects at 100% mean average precision for all four tasks while the prior method fails on two.
Anthology ID:
2022.findings-emnlp.370
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5044–5055
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.370
DOI:
10.18653/v1/2022.findings-emnlp.370
Bibkey:
Cite (ACL):
Xuandong Zhao, Lei Li, and Yu-Xiang Wang. 2022. Distillation-Resistant Watermarking for Model Protection in NLP. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5044–5055, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Distillation-Resistant Watermarking for Model Protection in NLP (Zhao et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-emnlp.370.pdf
Video:
 https://aclanthology.org/2022.findings-emnlp.370.mp4