BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents

Yunhao Feng; Yige Li; Yutao Wu; Yingshui Tan; Yanming Guo; Yifan Ding; Kun Zhai; Xingjun Ma; Yu-Gang Jiang

BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents

Yunhao Feng, Yige Li, Yutao Wu, Yingshui Tan, Yanming Guo, Yifan Ding, Kun Zhai, Xingjun Ma, Yu-Gang Jiang

Abstract

Large language model (LLM) agents execute tasks through multi-step workflows that combine planning, memory, and tool use. While this design enables autonomy, it also expands the attack surface for backdoor threats. Backdoor triggers injected into specific stages of an agent workflow can persist through multiple intermediate states and adversely influence downstream outputs. However, existing studies remain fragmented and typically analyze individual attack vectors in isolation, leaving the cross-stage interaction and propagation of backdoor triggers poorly understood from an agent-centric perspective.To fill this gap, we propose BackdoorAgent, a modular and stage-aware framework that provides a unified, agent-centric view of backdoor threats in LLM agents. BackdoorAgent structures the attack surface into three functional stages of agentic workflows, including planning attacks, memory attacks, and tool-use attacks, and instruments agent execution to enable systematic analysis of trigger activation and propagation across different stages.Building on this framework, we construct a standardized benchmark spanning four representative agent applications: Agent QA, Agent Code, Agent Web, and Agent Drive, covering both language-only and multimodal settings. Our empirical analysis shows that triggers implanted at a single stage can persist across multiple steps and propagate through intermediate states. For instance, when using a GPT-based backbone, we observe trigger persistence in 43.58% of planning attacks, 77.97% of memory attacks, and 60.28% of tool-stage attacks, highlighting the vulnerabilities of the agentic workflow itself to backdoor threats. Our code is available at https://github.com/Yunhao-Feng/BackdoorAgent.

Anthology ID:: 2026.findings-acl.791
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16115–16127
Language:
URL:: https://aclanthology.org/2026.findings-acl.791/
DOI:
Bibkey:
Cite (ACL):: Yunhao Feng, Yige Li, Yutao Wu, Yingshui Tan, Yanming Guo, Yifan Ding, Kun Zhai, Xingjun Ma, and Yu-Gang Jiang. 2026. BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents. In Findings of the Association for Computational Linguistics: ACL 2026, pages 16115–16127, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents (Feng et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.791.pdf
Checklist:: 2026.findings-acl.791.checklist.pdf

PDF Cite Search Checklist Fix data