TextVerifier: Robustness Verification for Textual Classifiers with Certifiable Guarantees

Siqi Sun; Wenjie Ruan

doi:10.18653/v1/2023.findings-acl.267

TextVerifier: Robustness Verification for Textual Classifiers with Certifiable Guarantees

Abstract

When textual classifiers are deployed in safety-critical workflows, they must withstand the onslaught of AI-enabled model confusion caused by adversarial examples with minor alterations. In this paper, the main objective is to provide a formal verification framework, called TextVerifier, with certifiable guarantees on deep neural networks in natural language processing against word-level alteration attacks. We aim to provide an approximation of the maximal safe radius by deriving provable bounds both mathematically and automatically, where a minimum word-level L_0 distance is quantified as a guarantee for the classification invariance of victim models. Here, we illustrate three strengths of our strategy: i) certifiable guarantee: effective verification with convergence to ensure approximation of maximal safe radius with tight bounds ultimately; ii) high-efficiency: it yields an efficient speed edge by a novel parallelization strategy that can process a set of candidate texts simultaneously on GPUs; and iii) reliable anytime estimation: the verification can return intermediate bounds, and robustness estimates that are gradually, but strictly, improved as the computation proceeds. Furthermore, experiments are conducted on text classification on four datasets over three victim models to demonstrate the validity of tightening bounds. Our tool TextVerifier is available at https://github.com/TrustAI/TextVerifer.

Anthology ID:: 2023.findings-acl.267
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4362–4380
Language:
URL:: https://aclanthology.org/2023.findings-acl.267/
DOI:: 10.18653/v1/2023.findings-acl.267
Bibkey:
Cite (ACL):: Siqi Sun and Wenjie Ruan. 2023. TextVerifier: Robustness Verification for Textual Classifiers with Certifiable Guarantees. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4362–4380, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: TextVerifier: Robustness Verification for Textual Classifiers with Certifiable Guarantees (Sun & Ruan, Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.267.pdf

PDF Cite Search Fix data