k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text

Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, Tianxing He


Abstract
Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection. While token-level watermarks are vulnerable to paraphrase attacks, SemStamp (Hou et al., 2023) applies watermark on the semantic representation of sentences and demonstrates promising robustness. SemStamp employs locality-sensitive hashing (LSH) to partition the semantic space with arbitrary hyperplanes, which results in a suboptimal tradeoff between robustness and speed. We propose k-SemStamp, a simple yet effective enhancement of SemStamp, utilizing k-means clustering as an alternative of LSH to partition the embedding space with awareness of inherent semantic structure. Experimental results indicate that k-SemStamp saliently improves its robustness and sampling efficiency while preserving the generation quality, advancing a more effective tool for machine-generated text detection.
Anthology ID:
2024.findings-acl.98
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1706–1715
Language:
URL:
https://aclanthology.org/2024.findings-acl.98
DOI:
Bibkey:
Cite (ACL):
Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. 2024. k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text. In Findings of the Association for Computational Linguistics ACL 2024, pages 1706–1715, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text (Hou et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.98.pdf