OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents

Pengzhou Cheng; Zheng Wu; Zongru Wu; Tianjie Ju; Aston Zhang; Zhuosheng Zhang; Gongshen Liu

doi:10.18653/v1/2025.findings-acl.348

OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents

Pengzhou Cheng, Zheng Wu, Zongru Wu, Tianjie Ju, Aston Zhang, Zhuosheng Zhang, Gongshen Liu

Abstract

Autonomous graphical user interface (GUI) agents powered by multimodal large language models have shown great promise. However, a critical yet underexplored issue persists: over-execution, where the agent executes tasks in a fully autonomous way, without adequate assessment of its action confidence to compromise an adaptive human-agent collaboration. This poses substantial risks in complex scenarios, such as those involving ambiguous user instructions, unexpected interruptions, and environmental hijacks. To address the issue, we introduce OS-Kairos, an adaptive GUI agent capable of predicting confidence levels at each interaction step and efficiently deciding whether to act autonomously or seek human intervention. OS-Kairos is developed through two key mechanisms: (i) collaborative probing that annotates confidence scores at each interaction step; (ii) confidence-driven interaction that leverages these confidence scores to elicit the ability of adaptive interaction. Experimental results show that OS-Kairos substantially outperforms existing models on our curated dataset featuring complex scenarios, as well as on established benchmarks such as AITZ and Meta-GUI, with 24.59%~87.29% improvements in task success rate. OS-Kairos facilitates an adaptive human-agent collaboration, prioritizing effectiveness, generality, scalability, and efficiency for real-world GUI interaction. The dataset and codes are available at Anonymous.

Anthology ID:: 2025.findings-acl.348
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6701–6725
Language:
URL:: https://aclanthology.org/2025.findings-acl.348/
DOI:: 10.18653/v1/2025.findings-acl.348
Bibkey:
Cite (ACL):: Pengzhou Cheng, Zheng Wu, Zongru Wu, Tianjie Ju, Aston Zhang, Zhuosheng Zhang, and Gongshen Liu. 2025. OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents. In Findings of the Association for Computational Linguistics: ACL 2025, pages 6701–6725, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents (Cheng et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.348.pdf

PDF Cite Search Fix data