An Entropy-based Text Watermarking Detection Method

Yijian Lu, Aiwei Liu, Dianzhi Yu, Jingjing Li, Irwin King


Abstract
Text watermarking algorithms for large language models (LLMs) can effectively identify machine-generated texts by embedding and detecting hidden features in the text. Although the current text watermarking algorithms perform well in most high-entropy scenarios, its performance in low-entropy scenarios still needs to be improved. In this work, we opine that the influence of token entropy should be fully considered in the watermark detection process, i.e., the weight of each token during watermark detection should be customized according to its entropy, rather than setting the weights of all tokens to the same value as in previous methods. Specifically, we propose Entropy-based Text Watermarking Detection (EWD) that gives higher-entropy tokens higher influence weights during watermark detection, so as to better reflect the degree of watermarking. Furthermore, the proposed detection process is training-free and fully automated. From the experiments, we demonstrate that our EWD can achieve better detection performance in low-entropy scenarios, and our method is also general and can be applied to texts with different entropy distributions. Our code and data is available. Additionally, our algorithm could be accessed through MarkLLM (CITATION).
Anthology ID:
2024.acl-long.630
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11724–11735
Language:
URL:
https://aclanthology.org/2024.acl-long.630
DOI:
Bibkey:
Cite (ACL):
Yijian Lu, Aiwei Liu, Dianzhi Yu, Jingjing Li, and Irwin King. 2024. An Entropy-based Text Watermarking Detection Method. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11724–11735, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
An Entropy-based Text Watermarking Detection Method (Lu et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.630.pdf