Contextualized Perturbation for Textual Adversarial Attack

Dianqi Li, Yizhe Zhang, Hao Peng, Liqun Chen, Chris Brockett, Ming-Ting Sun, Bill Dolan


Abstract
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness. Existing techniques of generating such examples are typically driven by local heuristic rules that are agnostic to the context, often resulting in unnatural and ungrammatical outputs. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs through a mask-then-infill procedure. CLARE builds on a pre-trained masked language model and modifies the inputs in a context-aware manner. We propose three contextualized perturbations, Replace, Insert and Merge, that allow for generating outputs of varied lengths. CLARE can flexibly combine these perturbations and apply them at any position in the inputs, and is thus able to attack the victim model more effectively with fewer edits. Extensive experiments and human evaluation demonstrate that CLARE outperforms the baselines in terms of attack success rate, textual similarity, fluency and grammaticality.
Anthology ID:
2021.naacl-main.400
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5053–5069
Language:
URL:
https://aclanthology.org/2021.naacl-main.400
DOI:
10.18653/v1/2021.naacl-main.400
Bibkey:
Cite (ACL):
Dianqi Li, Yizhe Zhang, Hao Peng, Liqun Chen, Chris Brockett, Ming-Ting Sun, and Bill Dolan. 2021. Contextualized Perturbation for Textual Adversarial Attack. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5053–5069, Online. Association for Computational Linguistics.
Cite (Informal):
Contextualized Perturbation for Textual Adversarial Attack (Li et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.400.pdf
Video:
 https://aclanthology.org/2021.naacl-main.400.mp4
Code
 cookielee77/CLARE
Data
AG NewsGLUEMultiNLIQNLI