ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation

Qinzhuo Wu; Wei Liu; Jian Luan; Bin Wang

doi:10.18653/v1/2025.naacl-long.244

ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation

Qinzhuo Wu, Wei Liu, Jian Luan, Bin Wang

Abstract

Recently, mobile AI agents have gained increasing attention. Given a task, mobile AI agents can interact with mobile devices in multiple steps and finally form a GUI flow that solves the task. However, existing agents tend to focus on most task-relevant elements at each step, leading to local optimal solutions and ignoring the overall GUI flow. To address this issue, we constructed a training dataset called MobileReach, which breaks the task into page reaching and operation subtasks. Furthermore, we propose ReachAgent, a two-stage framework that focuses on improving its task-completion abilities. It utilizes the page reaching and page operation subtasks, along with reward-based preference GUI flows, to further enhance the agent. Experimental results show that ReachAgent significantly improves the Intersection over Union (IoU) Accuracy and Text Accuracy by 7.12% and 7.69% on the step-level and 4.72% and 4.63% on the task-level compared to the SOTA agent. Our data and code will be released upon acceptance.

Anthology ID:: 2025.naacl-long.244
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4760–4775
Language:
URL:: https://aclanthology.org/2025.naacl-long.244/
DOI:: 10.18653/v1/2025.naacl-long.244
Bibkey:
Cite (ACL):: Qinzhuo Wu, Wei Liu, Jian Luan, and Bin Wang. 2025. ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4760–4775, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation (Wu et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-long.244.pdf

PDF Cite Search Fix data