Random Smooth-based Certified Defense against Text Adversarial Attack

Zeliang Zhang, Wei Yao, Susan Liang, Chenliang Xu


Abstract
Certified defense methods have identified their effectiveness against textual adversarial examples, which train models on the worst-case text generated by substituting words in original texts with synonyms. However, due to the discrete word embedding representations, the large search space hinders the robust training efficiency, resulting in significant time consumption. To overcome this challenge, motivated by the observation that synonym embedding has a small distance, we propose to treat the word substitution as a continuous perturbation on the word embedding representation. The proposed method Text-RS applies random smooth techniques to approximate the word substitution operation, offering a computationally efficient solution that outperforms conventional discrete methods and improves the robustness in training. The evaluation results demonstrate its effectiveness in defending against multiple textual adversarial attacks.
Anthology ID:
2024.findings-eacl.83
Volume:
Findings of the Association for Computational Linguistics: EACL 2024
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1251–1265
Language:
URL:
https://aclanthology.org/2024.findings-eacl.83
DOI:
Bibkey:
Cite (ACL):
Zeliang Zhang, Wei Yao, Susan Liang, and Chenliang Xu. 2024. Random Smooth-based Certified Defense against Text Adversarial Attack. In Findings of the Association for Computational Linguistics: EACL 2024, pages 1251–1265, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Random Smooth-based Certified Defense against Text Adversarial Attack (Zhang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-eacl.83.pdf