Chao Chen


2023

pdf bib
Enhancing Neural Topic Model with Multi-Level Supervisions from Seed Words
Yang Lin | Xin Gao | Xu Chu | Yasha Wang | Junfeng Zhao | Chao Chen
Findings of the Association for Computational Linguistics: ACL 2023

Efforts have been made to apply topic seed words to improve the topic interpretability of topic models. However, due to the semantic diversity of natural language, supervisions from seed words could be ambiguous, making it hard to be incorporated into the current neural topic models. In this paper, we propose SeededNTM, a neural topic model enhanced with supervisions from seed words on both word and document levels. We introduce a context-dependency assumption to alleviate the ambiguities with context document information, and an auto-adaptation mechanism to automatically balance between multi-level information. Moreover, an intra-sample consistency regularizer is proposed to deal with noisy supervisions via encouraging perturbation and semantic consistency. Extensive experiments on multiple datasets show that SeededNTM can derive semantically meaningful topics and outperforms the state-of-the-art seeded topic models in terms of topic quality and classification accuracy.

pdf bib
Attention-Enhancing Backdoor Attacks Against BERT-based Models
Weimin Lyu | Songzhu Zheng | Lu Pang | Haibin Ling | Chao Chen
Findings of the Association for Computational Linguistics: EMNLP 2023

Recent studies have revealed that Backdoor Attacks can threaten the safety of natural language processing (NLP) models. Investigating the strategies of backdoor attacks will help to understand the model’s vulnerability. Most existing textual backdoor attacks focus on generating stealthy triggers or modifying model weights. In this paper, we directly target the interior structure of neural networks and the backdoor mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention patterns. Our loss can be applied to different attacking methods to boost their attack efficacy in terms of attack successful rates and poisoning rates. It applies to not only traditional dirty-label attacks, but also the more challenging clean-label attacks. We validate our method on different backbone models (BERT, RoBERTa, and DistilBERT) and various tasks (Sentiment Analysis, Toxic Detection, and Topic Classification).

2022

pdf bib
A Study of the Attention Abnormality in Trojaned BERTs
Weimin Lyu | Songzhu Zheng | Tengfei Ma | Chao Chen
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Trojan attacks raise serious security concerns. In this paper, we investigate the underlying mechanism of Trojaned BERT models. We observe the attention focus drifting behavior of Trojaned models, i.e., when encountering an poisoned input, the trigger token hijacks the attention focus regardless of the context. We provide a thorough qualitative and quantitative analysis of this phenomenon, revealing insights into the Trojan mechanism. Based on the observation, we propose an attention-based Trojan detector to distinguish Trojaned models from clean ones. To the best of our knowledge, we are the first to analyze the Trojan mechanism and develop a Trojan detector based on the transformer’s attention.