Susmit Jha


2024

pdf bib
Task-Agnostic Detector for Insertion-Based Backdoor Attacks
Weimin Lyu | Xiao Lin | Songzhu Zheng | Lu Pang | Haibin Ling | Susmit Jha | Chao Chen
Findings of the Association for Computational Linguistics: NAACL 2024

Textual backdoor attacks pose significant security threats. Current detection approaches, typically relying on intermediate feature representation or reconstructing potential triggers, are task-specific and less effective beyond sentence classification, struggling with tasks like question answering and named entity recognition. We introduce TABDet (Task-Agnostic Backdoor Detector), a pioneering task-agnostic method for backdoor detection. TABDet leverages final layer logits combined with an efficient pooling technique, enabling unified logit representation across three prominent NLP tasks. TABDet can jointly learn from diverse task-specific models, demonstrating superior detection efficacy over traditional task-specific methods.