Detecting Word-Level Adversarial Text Attacks via SHapley Additive exPlanations

Lukas Huber, Marc Alexander Kühn, Edoardo Mosca, Georg Groh


Abstract
State-of-the-art machine learning models are prone to adversarial attacks”:” Maliciously crafted inputs to fool the model into making a wrong prediction, often with high confidence. While defense strategies have been extensively explored in the computer vision domain, research in natural language processing still lacks techniques to make models resilient to adversarial text inputs. We adapt a technique from computer vision to detect word-level attacks targeting text classifiers. This method relies on training an adversarial detector leveraging Shapley additive explanations and outperforms the current state-of-the-art on two benchmarks. Furthermore, we prove the detector requires only a low amount of training samples and, in some cases, generalizes to different datasets without needing to retrain.
Anthology ID:
2022.repl4nlp-1.16
Volume:
Proceedings of the 7th Workshop on Representation Learning for NLP
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Spandana Gella, He He, Bodhisattwa Prasad Majumder, Burcu Can, Eleonora Giunchiglia, Samuel Cahyawijaya, Sewon Min, Maximilian Mozes, Xiang Lorraine Li, Isabelle Augenstein, Anna Rogers, Kyunghyun Cho, Edward Grefenstette, Laura Rimell, Chris Dyer
Venue:
RepL4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
156–166
Language:
URL:
https://aclanthology.org/2022.repl4nlp-1.16
DOI:
10.18653/v1/2022.repl4nlp-1.16
Bibkey:
Cite (ACL):
Lukas Huber, Marc Alexander Kühn, Edoardo Mosca, and Georg Groh. 2022. Detecting Word-Level Adversarial Text Attacks via SHapley Additive exPlanations. In Proceedings of the 7th Workshop on Representation Learning for NLP, pages 156–166, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Detecting Word-Level Adversarial Text Attacks via SHapley Additive exPlanations (Huber et al., RepL4NLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.repl4nlp-1.16.pdf
Video:
 https://aclanthology.org/2022.repl4nlp-1.16.mp4
Data
AG NewsIMDb Movie ReviewsSSTSST-2