Subtle Signatures, Strong Shields: Advancing Robust and Imperceptible Watermarking in Large Language Models

Yubing Ren, Ping Guo, Yanan Cao, Wei Ma


Abstract
The widespread adoption of Large Language Models (LLMs) has led to an increase in AI-generated text on the Internet, presenting a crucial challenge to differentiate AI-created content from human-written text. This challenge is critical to prevent issues of authenticity, trust, and potential copyright violations. Current research focuses on watermarking LLM-generated text, but traditional techniques struggle to balance robustness with text quality. We introduce a novel watermarking approach, Robust and Imperceptible Watermarking (RIW) for LLMs, which leverages token prior probabilities to improve detectability and maintain watermark imperceptibility. RIW methodically embeds watermarks by partitioning selected tokens into two distinct groups based on their prior probabilities and employing tailored strategies for each group. In the detection stage, the RIW method employs the ‘voted z-test’ to provide a statistically robust framework to identify the presence of a watermark accurately. The effectiveness of RIW is evaluated across three key dimensions: success rate, text quality, and robustness against removal attacks. Our experimental results on various LLMs, including GPT2-XL, OPT-1.3B, and LLaMA2-7B, indicate that RIW surpasses existing models, and also exhibits increased robustness against various attacks and good imperceptibility, thus promoting the responsible use of LLMs.
Anthology ID:
2024.findings-acl.327
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5508–5519
Language:
URL:
https://aclanthology.org/2024.findings-acl.327
DOI:
Bibkey:
Cite (ACL):
Yubing Ren, Ping Guo, Yanan Cao, and Wei Ma. 2024. Subtle Signatures, Strong Shields: Advancing Robust and Imperceptible Watermarking in Large Language Models. In Findings of the Association for Computational Linguistics ACL 2024, pages 5508–5519, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Subtle Signatures, Strong Shields: Advancing Robust and Imperceptible Watermarking in Large Language Models (Ren et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.327.pdf