Yufan Zhong
2026
RShield: A User-level Traceable Backdoor Watermark for LLMs in Embedding-as-a-Service
Lingyun Xiang | Yufan Zhong | Chengfu Ou | Zhihua Xia | Chunfang Yang | Daojian Zeng | Zhangjie Fu
Findings of the Association for Computational Linguistics: ACL 2026
Lingyun Xiang | Yufan Zhong | Chengfu Ou | Zhihua Xia | Chunfang Yang | Daojian Zeng | Zhangjie Fu
Findings of the Association for Computational Linguistics: ACL 2026
Embedding-as-a-Service (EaaS) has emerged as a critical paradigm for commercializing large language models (LLMs). However, existing backdoor watermarking techniques are fundamentally limited to "zero-bit" detection, which prevents user-level traceability in multi-user EaaS scenarios. To address these limitations, we propose RShield, a multi-bit backdoor watermarking that enables reliable user-level attribution of LLMs for EaaS under model extraction attacks. RShield integrates Reed-Solomon error-correcting codes with orthogonal feature mapping to introduce highly-structured redundancy, constructing fault-tolerant symbol sequences for multi-bit watermark space, thereby staying recoverable even after aggressive extraction noise condition.To mitigate semantic distortion under the interference of noise channel, RShield employs a lightweight Adapter to adaptively inject multi-bit watermarks in the feature space, preserving the quality of EaaS while achieving a user-level traceability.Extensive experiments on four NLP benchmarks demonstrate that RShield efficiently achieves 100% multi-bit watermark recovery and high semantic fidelity under model extraction attacks compared to existing methods, while significantly reducing the degradation of watermarking on downstream task performance.