Generalization to Mitigate Synonym Substitution Attacks

Basemah Alshemali, Jugal Kalita


Abstract
Studies have shown that deep neural networks (DNNs) are vulnerable to adversarial examples – perturbed inputs that cause DNN-based models to produce incorrect results. One robust adversarial attack in the NLP domain is the synonym substitution. In attacks of this variety, the adversary substitutes words with synonyms. Since synonym substitution perturbations aim to satisfy all lexical, grammatical, and semantic constraints, they are difficult to detect with automatic syntax check as well as by humans. In this paper, we propose a structure-free defensive method that is capable of improving the performance of DNN-based models with both clean and adversarial data. Our findings show that replacing the embeddings of the important words in the input samples with the average of their synonyms’ embeddings can significantly improve the generalization of DNN-based classifiers. By doing so, we reduce model sensitivity to particular words in the input samples. Our results indicate that the proposed defense is not only capable of defending against adversarial attacks, but is also capable of improving the performance of DNN-based models when tested on benign data. On average, the proposed defense improved the classification accuracy of the CNN and Bi-LSTM models by 41.30% and 55.66%, respectively, when tested under adversarial attacks. Extended investigation shows that our defensive method can improve the robustness of nonneural models, achieving an average of 17.62% and 22.93% classification accuracy increase on the SVM and XGBoost models, respectively. The proposed defensive method has also shown an average of 26.60% classification accuracy improvement when tested with the infamous BERT model. Our algorithm is generic enough to be applied in any NLP domain and to any model trained on any natural language.
Anthology ID:
2020.deelio-1.3
Volume:
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures
Month:
November
Year:
2020
Address:
Online
Editors:
Eneko Agirre, Marianna Apidianaki, Ivan Vulić
Venue:
DeeLIO
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20–28
Language:
URL:
https://aclanthology.org/2020.deelio-1.3
DOI:
10.18653/v1/2020.deelio-1.3
Bibkey:
Cite (ACL):
Basemah Alshemali and Jugal Kalita. 2020. Generalization to Mitigate Synonym Substitution Attacks. In Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 20–28, Online. Association for Computational Linguistics.
Cite (Informal):
Generalization to Mitigate Synonym Substitution Attacks (Alshemali & Kalita, DeeLIO 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.deelio-1.3.pdf
Video:
 https://slideslive.com/38939726
Data
IMDb Movie ReviewsYahoo! Answers