Yingqian Cui
2024
A Robust Semantics-based Watermark for Large Language Model against Paraphrasing
Jie Ren
|
Han Xu
|
Yiding Liu
|
Yingqian Cui
|
Shuaiqiang Wang
|
Dawei Yin
|
Jiliang Tang
Findings of the Association for Computational Linguistics: NAACL 2024
Large language models (LLMs) have show their remarkable ability in various natural language tasks. However, there are concerns that LLMs are possible to be used improperly or even illegally. To prevent the malicious usage of LLMs, detecting LLM-generated text becomes crucial in the deployment of LLM applications. Watermarking is an effective strategy to detect the LLM-generated content by encoding a pre-defined secret watermark to facilitate the detection process. However, the majority of existing watermark methods leverage the simple hashes of precedent tokens to partition vocabulary. Such watermarks can be easily eliminated by paraphrase and, correspondingly, the detection effectiveness will be greatly compromised. Thus, to enhance the robustness against paraphrase, we propose a semantics-based watermark framework, SemaMark. It leverages the semantics as an alternative to simple hashes of tokens since the semantic meaning of the sentences will be likely preserved under paraphrase and the watermark can remain robust. Comprehensive experiments are conducted to demonstrate the effectiveness and robustness of SemaMark under different paraphrases.
On the Generalization of Training-based ChatGPT Detection Methods
Han Xu
|
Jie Ren
|
Pengfei He
|
Shenglai Zeng
|
Yingqian Cui
|
Amy Liu
|
Hui Liu
|
Jiliang Tang
Findings of the Association for Computational Linguistics: EMNLP 2024
Large language models, such as ChatGPT, achieve amazing performance on various language processing tasks. However, they can also be exploited for improper purposes such as plagiarism or misinformation dissemination. Thus, there is an urgent need to detect the texts generated by LLMs. One type of most studied methods trains classification models to distinguish LLM texts from human texts. However, existing studies demonstrate the trained models may suffer from distribution shifts (during test), i.e., they are ineffective to predict the generated texts from unseen language tasks or topics which are not collected during training. In this work, we focus on ChatGPT as a representative model, and we conduct a comprehensive investigation on these methods’ generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks. To achieve this goal, we first collect a new dataset with human and ChatGPT texts, and then we conduct extensive studies on the collected dataset. Our studies unveil insightful findings that provide guidance for future methodologies and data collection strategies for LLM detection.
Search
Co-authors
- Jie Ren 2
- Han Xu 2
- Jiliang Tang 2
- Yiding Liu 1
- Shuaiqiang Wang 1
- show all...