Zeguan Xiao
2024
Distract Large Language Models for Automatic Jailbreak Attack
Zeguan Xiao
|
Yan Yang
|
Guanhua Chen
|
Yun Chen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Extensive efforts have been made before the public release of Large language models (LLMs) to align their behaviors with human values. However, even meticulously aligned LLMs remain vulnerable to malicious manipulations such as jailbreaking, leading to unintended behaviors. In this work, we propose a novel black-box jailbreak framework for automated red teaming of LLMs. We designed malicious content concealing and memory reframing with an iterative optimization algorithm to jailbreak LLMs, motivated by the research about the distractibility and over-confidence phenomenon of LLMs. Extensive experiments of jailbreaking both open-source and proprietary LLMs demonstrate the superiority of our framework in terms of effectiveness, scalability and transferability. We also evaluate the effectiveness of existing jailbreak defense methods against our attack and highlight the crucial need to develop more effective and practical defense strategies.
2022
Pruning Adatperfusion with Lottery Ticket Hypothesis
Jiarun Wu
|
Qingliang Chen
|
Zeguan Xiao
|
Yuliang Gu
|
Mengsi Sun
Findings of the Association for Computational Linguistics: NAACL 2022
Pre-trained language models have shown great success in multiple downstream tasks. However, they are computationally expensive to fine-tune. Thus, transfer learning with adapter modules has been introduced to alleviate this problem, helping to extract knowledge of the downstream tasks. Adapterfusion models are an example of the transformers-with-adapter-modules, which merge multiple adapters to incorporate knowledge from different tasks. However, merging multiple adapters will inevitably cause redundancies, increasing the training and inference time massively. Therefore, in this paper, we propose an approach to identify the influence of each adapter module and a novel way to prune adapters based on the prestigious Lottery Ticket Hypothesis. Experiments on GLUE datasets show that the pruned Adapterfusion model with our scheme can achieve state-of-the-art results, reducing sizes significantly while keeping performance intact.
2021
BERT4GCN: Using BERT Intermediate Layers to Augment GCN for Aspect-based Sentiment Classification
Zeguan Xiao
|
Jiarun Wu
|
Qingliang Chen
|
Congjian Deng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Graph-based Aspect-based Sentiment Classification (ABSC) approaches have yielded state-of-the-art results, expecially when equipped with contextual word embedding from pre-training language models (PLMs). However, they ignore sequential features of the context and have not yet made the best of PLMs. In this paper, we propose a novel model, BERT4GCN, which integrates the grammatical sequential features from the PLM of BERT, and the syntactic knowledge from dependency graphs. BERT4GCN utilizes outputs from intermediate layers of BERT and positional information between words to augment GCN (Graph Convolutional Network) to better encode the dependency graphs for the downstream classification. Experimental results demonstrate that the proposed BERT4GCN outperforms all state-of-the-art baselines, justifying that augmenting GCN with the grammatical features from intermediate layers of BERT can significantly empower ABSC models.
Search
Co-authors
- Jiarun Wu 2
- Qingliang Chen 2
- Yan Yang 1
- Guanhua Chen 1
- Yun Chen 1
- show all...