Generating Fluent Adversarial Examples for Natural Languages

Huangzhao Zhang, Hao Zhou, Ning Miao, Lei Li


Abstract
Efficiently building an adversarial attacker for natural language processing (NLP) tasks is a real challenge. Firstly, as the sentence space is discrete, it is difficult to make small perturbations along the direction of gradients. Secondly, the fluency of the generated examples cannot be guaranteed. In this paper, we propose MHA, which addresses both problems by performing Metropolis-Hastings sampling, whose proposal is designed with the guidance of gradients. Experiments on IMDB and SNLI show that our proposed MHAoutperforms the baseline model on attacking capability. Adversarial training with MHA also leads to better robustness and performance.
Anthology ID:
P19-1559
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5564–5569
Language:
URL:
https://aclanthology.org/P19-1559
DOI:
10.18653/v1/P19-1559
Bibkey:
Cite (ACL):
Huangzhao Zhang, Hao Zhou, Ning Miao, and Lei Li. 2019. Generating Fluent Adversarial Examples for Natural Languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5564–5569, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Generating Fluent Adversarial Examples for Natural Languages (Zhang et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1559.pdf
Supplementary:
 P19-1559.Supplementary.pdf
Data
IMDb Movie ReviewsSNLI