HiSA: Hierarchical State Abstraction for Scalable GUI Agents

Weiming Li; Hye-young Paik; Yulei Sui

HiSA: Hierarchical State Abstraction for Scalable GUI Agents

Abstract

Multimodal GUI agents generally operate on raw visual and textual observations, which creates a fundamental scalability challenge. While current state-of-the-art frameworks predominantly rely on inference-intensive test-time scaling or the accumulation of unbounded raw logs to maintain task coherence, we attribute the underlying bottleneck to insufficient state abstraction.To address this, we propose HiSA, a hierarchical state abstraction approach that actively restructures knowledge rather than passively retaining historical information by organizing raw histories into a three-level hierarchy of abstracted steps, refined contexts, and induced patterns.By synthesizing high-dimensional observations into compact semantic states, HiSA decouples reasoning efficacy from context length, enabling precise and scalable decision-making as interaction histories grow.When evaluating using Spider2-V, our approach establishes a new state-of-the-art, achieving a 40.58% success rate while reducing token consumption by 69.85% and monetary costs by 55.10% compared to the best-performing baseline.

Anthology ID:: 2026.findings-acl.581
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11965–11985
Language:
URL:: https://aclanthology.org/2026.findings-acl.581/
DOI:
Bibkey:
Cite (ACL):: Weiming Li, Hye-young Paik, and Yulei Sui. 2026. HiSA: Hierarchical State Abstraction for Scalable GUI Agents. In Findings of the Association for Computational Linguistics: ACL 2026, pages 11965–11985, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: HiSA: Hierarchical State Abstraction for Scalable GUI Agents (Li et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.581.pdf
Checklist:: 2026.findings-acl.581.checklist.pdf

PDF Cite Search Checklist Fix data