Retrieval-augmented GUI Agents with Generative Guidelines

Ran Xu; Kaixin Ma; Wenhao Yu; Hongming Zhang; Joyce C. Ho; Carl Yang; Dong Yu (于东)

doi:10.18653/v1/2025.emnlp-main.902

Retrieval-augmented GUI Agents with Generative Guidelines

Ran Xu, Kaixin Ma, Wenhao Yu, Hongming Zhang, Joyce C. Ho, Carl Yang, Dong Yu

Abstract

GUI agents powered by vision-language models (VLMs) show promise in automating complex digital tasks. However, their effectiveness in real-world applications is often limited by scarce training data and the inherent complexity of these tasks, which frequently require long-tailed knowledge covering rare, unseen scenarios. We propose RAG-GUI , a lightweight VLM that leverages web tutorials at inferencetime. RAG-GUI is first warm-started via supervised finetuning (SFT) and further refined through self-guided rejection sampling fine-tuning (RSF). Designed to be model-agnostic, RAG-GUI functions as a generic plug-in that enhances any VLM-based agent. Evaluatedacross three distinct tasks, it consistently outperforms baseline agents and surpasses other inference baselines by 2.6% to 13.3% acrosstwo model sizes, demonstrating strong generalization and practical plug-and-play capabilities in real-world scenarios.

Anthology ID:: 2025.emnlp-main.902
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17866–17875
Language:
URL:: https://aclanthology.org/2025.emnlp-main.902/
DOI:: 10.18653/v1/2025.emnlp-main.902
Bibkey:
Cite (ACL):: Ran Xu, Kaixin Ma, Wenhao Yu, Hongming Zhang, Joyce C. Ho, Carl Yang, and Dong Yu. 2025. Retrieval-augmented GUI Agents with Generative Guidelines. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 17866–17875, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Retrieval-augmented GUI Agents with Generative Guidelines (Xu et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.902.pdf
Checklist:: 2025.emnlp-main.902.checklist.pdf

PDF Cite Search Checklist Fix data