Beyond Blind Following: Evaluating Robustness of LLM Agents under Imperfect Guidance

Yao Fu; Ran Qiu; Xinhe Wang; Jacob Sansom; Sathvika Ayyappa Prabhu; Huijie Tang; Jaekyeom Kim; Sungryull Sohn; Honglak Lee

Beyond Blind Following: Evaluating Robustness of LLM Agents under Imperfect Guidance

Yao Fu, Ran Qiu, Xinhe Wang, Jacob Sansom, Sathvika Ayyappa Prabhu, Huijie Tang, Jaekyeom Kim, Sungryull Sohn, Honglak Lee

Abstract

Large language models (LLMs) have shown strong capabilities as task-solving agents across interactive domains. However, in complex environments, these agents may need to rely on auxiliary guidance to reduce the search space or make up for limited domain-specific knowledge. Such guidance includes human-provided manuals and demonstrations, retrieved examples from memory or external tools, high-level heuristics, and agent-acquired knowledge from prior interactions. However, this guidance may be imperfect. For example, due to changes in the environment, ambiguous or simplified language, or retrieval errors from external sources, guidance can be incomplete, outdated, or contextually mismatched, potentially causing errors or failures during task execution. To address this, we introduce MIRAGE, a benchmark for MeasurIng Robustness of LLM Agents under Imperfect GuidancE. MIRAGE includes procedurally generated environments in navigation, cooking, and gaming, where both the environment and the auxiliary guidance vary in fidelity and relevance. We further extend MIRAGE to realistic web tasks via WebArena, using noisy or underspecified instructions extracted from demonstrations. Our findings reveal critical failure modes in current LLM agents and motivate future work on improving their robustness under imperfect guidance.

Anthology ID:: 2026.eacl-long.310
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6591–6618
Language:
URL:: https://aclanthology.org/2026.eacl-long.310/
DOI:
Bibkey:
Cite (ACL):: Yao Fu, Ran Qiu, Xinhe Wang, Jacob Sansom, Sathvika Ayyappa Prabhu, Huijie Tang, Jaekyeom Kim, Sungryull Sohn, and Honglak Lee. 2026. Beyond Blind Following: Evaluating Robustness of LLM Agents under Imperfect Guidance. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6591–6618, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Beyond Blind Following: Evaluating Robustness of LLM Agents under Imperfect Guidance (Fu et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-long.310.pdf
Checklist:: 2026.eacl-long.310.checklist.pdf

PDF Cite Search Checklist Fix data