The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples

Heng Yang, Ke Li


Abstract
Recent studies have revealed the vulnerability of pre-trained language models to adversarial attacks. Adversarial defense techniques have been proposed to reconstruct adversarial examples within feature or text spaces. However, these methods struggle to effectively repair the semantics in adversarial examples, resulting in unsatisfactory defense performance. To repair the semantics in adversarial examples, we introduce a novel approach named Reactive Perturbation Defocusing (Rapid), which employs an adversarial detector to identify the fake labels of adversarial examples and leverages adversarial attackers to repair the semantics in adversarial examples. Our extensive experimental results, conducted on four public datasets, demonstrate the consistent effectiveness of Rapid in various adversarial attack scenarios. For easy evaluation, we provide a click-to-run demo of Rapid at https://tinyurl.com/22ercuf8.
Anthology ID:
2024.emnlp-main.481
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8439–8457
Language:
URL:
https://aclanthology.org/2024.emnlp-main.481
DOI:
Bibkey:
Cite (ACL):
Heng Yang and Ke Li. 2024. The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 8439–8457, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples (Yang & Li, EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.481.pdf
Software:
 2024.emnlp-main.481.software.zip
Data:
 2024.emnlp-main.481.data.zip