Zhaoyang Li
Papers on this page may belong to the following people: Zhaoyang Li, Zhaoyang Li
2026
Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning
Zhiyuan Chang | Mingyang Li | Yuekai Huang | Ziyou Jiang | Xiaojun Jia | Qian Xiong | Junjie Wang | Zhaoyang Li | Qing Wang
Findings of the Association for Computational Linguistics: ACL 2026
Zhiyuan Chang | Mingyang Li | Yuekai Huang | Ziyou Jiang | Xiaojun Jia | Qian Xiong | Junjie Wang | Zhaoyang Li | Qing Wang
Findings of the Association for Computational Linguistics: ACL 2026
Large language model (LLM)-integrated applications have become increasingly prevalent, yet face critical security vulnerabilities from prompt injection (PI) attacks. Defending against PI attacks faces two major issues: malicious instructions can be injected through diverse vectors, and injected instructions often lack clear semantic boundaries from the surrounding context, making them difficult to identify. To address these issues, we propose InstruCoT, a model enhancement method for PI defense that synthesizes diverse training data and employs instruction-level chain-of-thought fine-tuning, enabling LLMs to effectively identify and reject malicious instructions regardless of their source or position in the context. We evaluate InstruCoT across three critical dimensions: Behavior Deviation, Privacy Leakage, and Harmful Output. Experimental results across four LLMs demonstrate that InstruCoT significantly outperforms baselines in all dimensions while maintaining utility performance without degradation.
All Changes May Have Invariant Principles: Improving Ever-Shifting Harmful Meme Detection via Design Concept Reproduction
Ziyou Jiang | Mingyang Li | Junjie Wang | Yuekai Huang | Jie Huang | Zhiyuan Chang | Zhaoyang Li | Qing Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ziyou Jiang | Mingyang Li | Junjie Wang | Yuekai Huang | Jie Huang | Zhiyuan Chang | Zhaoyang Li | Qing Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Harmful memes are ever-shifting in the Internet communities, which are difficult to analyze due to their type-shifting and temporal-evolving nature. Although these memes are shifting, we find that different memes may share invariant principles, i.e., the underlying design concept of malicious users, which can help us analyze why these memes are harmful. In this paper, we propose RepMD, an ever-shifting harmful meme detection method based on the design concept reproduction. We first refer to the attack tree to define the Design Concept Graph (DCG), which describes steps that people may take to design a harmful meme. Then, we derive the DCG from historical memes with design step reproduction and graph pruning. Finally, we use DCG to guide the Multimodal Large Language Model (MLLM) to detect harmful memes. The evaluation results show that RepMD achieves the highest accuracy with 81.1% and has slight accuracy decreases when generalized to type-shifting and temporal-evolving memes. Human evaluation shows that RepMD can improve the efficiency of human discovery on harmful memes, with 15∼30 seconds per meme.