Qin Liu


pdf bib
Characterizing the Impacts of Instances on Robustness
Rui Zheng | Zhiheng Xi | Qin Liu | Wenbin Lai | Tao Gui | Qi Zhang | Xuanjing Huang | Jin Ma | Ying Shan | Weifeng Ge
Findings of the Association for Computational Linguistics: ACL 2023

Building robust deep neural networks (DNNs) against adversarial attacks is an important but challenging task. Previous defense approaches mainly focus on developing new model structures or training algorithms, but they do little to tap the potential of training instances, especially instances with robust patterns carring innate robustness. In this paper, we show that robust and non-robust instances in the training dataset, though are both important for test performance, have contrary impacts on robustness, which makes it possible to build a highly robust model by leveraging the training dataset in a more effective way. We propose a new method that can distinguish between robust instances from non-robust ones according to the model’s sensitivity to perturbations on individual instances during training. Surprisingly, we find that the model under standard training easily overfits the robust instances by relying on their simple patterns before the model completely learns their robust features. Finally, we propose a new mitigation algorithm to further release the potential of robust instances. Experimental results show that proper use of robust instances in the original dataset is a new line to achieve highly robust models.

pdf bib
Detecting Adversarial Samples through Sharpness of Loss Landscape
Rui Zheng | Shihan Dou | Yuhao Zhou | Qin Liu | Tao Gui | Qi Zhang | Zhongyu Wei | Xuanjing Huang | Menghan Zhang
Findings of the Association for Computational Linguistics: ACL 2023

Deep neural networks (DNNs) have been proven to be sensitive towards perturbations on input samples, and previous works highlight that adversarial samples are even more vulnerable than normal ones. In this work, this phenomenon is illustrated frWe first show that adversarial samples locate in steep and narrow local minima of the loss landscape (high sharpness) while normal samples, which differs distinctly from adversarial ones, reside in the loss surface that is more flatter (low sharpness).om the perspective of sharpness via visualizing the input loss landscape of models. Based on this, we propose a simple and effective sharpness-based detector to distinct adversarial samples by maximizing the loss increment within the region where the inference sample is located. Considering that the notion of sharpness of a loss landscape is relative, we further propose an adaptive optimization strategy in an attempt to fairly compare the relative sharpness among different samples. Experimental results show that our approach can outperform previous detection methods by large margins (average +6.6 F1 score) for four advanced attack strategies considered in this paper across three text classification tasks.


pdf bib
Flooding-X: Improving BERT’s Resistance to Adversarial Attacks via Loss-Restricted Fine-Tuning
Qin Liu | Rui Zheng | Bao Rong | Jingyi Liu | ZhiHua Liu | Zhanzhan Cheng | Liang Qiao | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Adversarial robustness has attracted much attention recently, and the mainstream solution is adversarial training. However, the tradition of generating adversarial perturbations for each input embedding (in the settings of NLP) scales up the training computational complexity by the number of gradient steps it takes to obtain the adversarial samples. To address this problem, we leverage Flooding method which primarily aims at better generalization and we find promising in defending adversarial attacks. We further propose an effective criterion to bring hyper-parameter-dependent flooding into effect with a narrowed-down search space by measuring how the gradient steps taken within one epoch affect the loss of each batch. Our approach requires zero adversarial sample for training, and its time consumption is equivalent to fine-tuning, which can be 2-15 times faster than standard adversarial training. We experimentally show that our method improves BERT’s resistance to textual adversarial attacks by a large margin, and achieves state-of-the-art robust accuracy on various text classification and GLUE tasks.

pdf bib
PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack
Rui Zheng | Rong Bao | Qin Liu | Tao Gui | Qi Zhang | Xuanjing Huang | Rui Xie | Wei Wu
Proceedings of the 29th International Conference on Computational Linguistics

Adversarial training, which minimizes the loss of adversarially perturbed examples, has received considerable attention. However, these methods require modifying all model parameters and optimizing the model from scratch, which is parameter inefficient and unfriendly to the already deployed models. As an alternative, we propose a pluggable defense module PlugAT, to provide robust predictions by adding a few trainable parameters to the model inputs while keeping the original model frozen. To reduce the potential side effects of using defense modules, we further propose a novel forgetting restricted adversarial training, which filters out bad adversarial examples that impair the performance of original ones. The PlugAT-equipped BERT model substantially improves robustness over several strong baselines on various text classification tasks, whilst training only 9.1% parameters. We observe that defense modules trained under the same model architecture have domain adaptation ability between similar text classification datasets.


pdf bib
TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
Xiao Wang | Qin Liu | Tao Gui | Qi Zhang | Yicheng Zou | Xin Zhou | Jiacheng Ye | Yongxin Zhang | Rui Zheng | Zexiong Pang | Qinzhuo Wu | Zhengyan Li | Chong Zhang | Ruotian Ma | Zichu Fei | Ruijian Cai | Jun Zhao | Xingwu Hu | Zhiheng Yan | Yiding Tan | Yuan Hu | Qiyuan Bian | Zhihua Liu | Shan Qin | Bolin Zhu | Xiaoyu Xing | Jinlan Fu | Yue Zhang | Minlong Peng | Xiaoqing Zheng | Yaqian Zhou | Zhongyu Wei | Xipeng Qiu | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

TextFlint is a multilingual robustness evaluation toolkit for NLP tasks that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analyses. This enables practitioners to automatically evaluate their models from various aspects or to customize their evaluations as desired with just a few lines of code. TextFlint also generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model in terms of its robustness. To guarantee acceptability, all the text transformations are linguistically based and all the transformed data selected (up to 100,000 texts) scored highly under human evaluation. To validate the utility, we performed large-scale empirical evaluations (over 67,000) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. The toolkit is already available at https://github.com/textflint with all the evaluation results demonstrated at textflint.io.