Xiaoze Liu
2024
SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation
Xiaoze Liu
|
Ting Sun
|
Tianyang Xu
|
Feijie Wu
|
Cunxiang Wang
|
Xiaoqian Wang
|
Jing Gao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) have transformed machine learning but raised significant legal concerns due to their potential to produce text that infringes on copyrights, resulting in several high-profile lawsuits. The legal landscape is struggling to keep pace with these rapid advancements, with ongoing debates about whether generated text might plagiarize copyrighted materials. Current LLMs may infringe on copyrights or overly restrict non-copyrighted texts, leading to these challenges: (i) the need for a comprehensive evaluation benchmark to assess copyright compliance from multiple aspects; (ii) evaluating robustness against safeguard bypassing attacks; and (iii) developing effective defenses targeted against the generation of copyrighted text.To tackle these challenges, we introduce a curated dataset to evaluate methods, test attack strategies, and propose a lightweight, real-time defense mechanism to prevent the generation of copyrighted text, ensuring the safe and lawful use of LLMs. Our experiments demonstrate that current LLMs frequently output copyrighted text, and that jailbreaking attacks can significantly increase the volume of copyrighted output. Our proposed defense mechanism substantially reduces the volume of copyrighted text generated by LLMs by effectively refusing malicious requests.
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Tianyang Xu
|
Shujin Wu
|
Shizhe Diao
|
Xiaoze Liu
|
Xingyao Wang
|
Yangyi Chen
|
Jing Gao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications. Previous work has elicited confidence from LLMs by direct or self-consistency prompting, or constructing specific datasets for supervised finetuning. The prompting-based approaches have inferior performance, and the training-based approaches are limited to binary or inaccurate group-level confidence estimates. In this work, we present SaySelf, a novel training framework that teaches LLMs to express more fine-grained confidence estimates. In addition, beyond the confidence scores, SaySelf initiates the process of directing LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge and explain their uncertainty. This is achieved by using an LLM to automatically summarize the uncertainties in specific knowledge via natural language. The summarization is based on the analysis of the inconsistency in multiple sampled reasoning chains, and the resulting data is utilized for supervised fine-tuning. Moreover, we utilize reinforcement learning with a meticulously crafted reward function to calibrate the confidence estimates, motivating LLMs to deliver accurate, high-confidence predictions and to penalize overconfidence in erroneous outputs. Experimental results demonstrate the effectiveness of SaySelf in reducing the confidence calibration error and maintaining the task performance. The generated self-reflective rationales are also reasonable and can further contribute to the calibration. The code is made public at https://github.com/xu1868/SaySelf.
Search
Co-authors
- Tianyang Xu 2
- Jing Gao 2
- Ting Sun 1
- Feijie Wu 1
- Cunxiang Wang 1
- show all...