On the Robustness of Self-Attentive Models

Yu-Lun Hsieh, Minhao Cheng, Da-Cheng Juan, Wei Wei, Wen-Lian Hsu, Cho-Jui Hsieh


Abstract
This work examines the robustness of self-attentive neural networks against adversarial input perturbations. Specifically, we investigate the attention and feature extraction mechanisms of state-of-the-art recurrent neural networks and self-attentive architectures for sentiment analysis, entailment and machine translation under adversarial attacks. We also propose a novel attack algorithm for generating more natural adversarial examples that could mislead neural models but not humans. Experimental results show that, compared to recurrent neural models, self-attentive models are more robust against adversarial perturbation. In addition, we provide theoretical explanations for their superior robustness to support our claims.
Anthology ID:
P19-1147
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1520–1529
Language:
URL:
https://aclanthology.org/P19-1147
DOI:
10.18653/v1/P19-1147
Bibkey:
Cite (ACL):
Yu-Lun Hsieh, Minhao Cheng, Da-Cheng Juan, Wei Wei, Wen-Lian Hsu, and Cho-Jui Hsieh. 2019. On the Robustness of Self-Attentive Models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1520–1529, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
On the Robustness of Self-Attentive Models (Hsieh et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1147.pdf
Data
MultiNLI