Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

Xinhang Ma; William Yeoh; Ning Zhang; Yevgeniy Vorobeychik

Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

Xinhang Ma, William Yeoh, Ning Zhang, Yevgeniy Vorobeychik

Abstract

Knowledge distillation is a widely adopted technique for transferring capabilities from LLMs to smaller, more efficient student models.However, unauthorized use of knowledge distillation takes unfair advantage of the considerable effort and cost put into developing frontier models.We investigate methods for modifying teacher-generated reasoning traces to achieve two objectives that deter unauthorized distillation: (1) anti-distillation, or degrading the training usefulness of query responses, and (2) API watermarking, which embeds verifiable signatures in student models.We introduce several approaches for dynamically rewriting a teacher’s reasoning outputs while preserving answer correctness and semantic coherence.Two of these leverage the rewriting capabilities of LLMs, while others use gradient-based techniques.Our experiments show that a simple instruction-based rewriting approach achieves a strong anti-distillation effect while maintaining or even improving teacher performance.Furthermore, we show that our rewriting approach also enables embedding watermarks that can be reliably detectedwith essentially no false alarms.Our code is available at https://github.com/xhOwenMa/trace-rewriting.

Anthology ID:: 2026.acl-long.519
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11307–11324
Language:
URL:: https://aclanthology.org/2026.acl-long.519/
DOI:
Bibkey:
Cite (ACL):: Xinhang Ma, William Yeoh, Ning Zhang, and Yevgeniy Vorobeychik. 2026. Protecting Language Models Against Unauthorized Distillation through Trace Rewriting. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11307–11324, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Protecting Language Models Against Unauthorized Distillation through Trace Rewriting (Ma et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.519.pdf
Checklist:: 2026.acl-long.519.checklist.pdf

PDF Cite Search Checklist Fix data