MobileA3gent: Training Mobile GUI Agents Using Decentralized Self-Sourced Data from Diverse Users

WenHao Wang; Mengying Yuan; Zijie Yu; Guangyi Liu; Rui Ye; Tian Jin; Siheng Chen; Yanfeng Wang

MobileA3gent: Training Mobile GUI Agents Using Decentralized Self-Sourced Data from Diverse Users

WenHao Wang, Mengying Yuan, Zijie Yu, Guangyi Liu, Rui Ye, Tian Jin, Siheng Chen, Yanfeng Wang

Abstract

The advancement of mobile GUI agents has opened new opportunities for automating tasks on mobile devices. Training these agents requires large-scale high-quality data, which is prohibitively expensive when relying on human labor. Given the vast population of global mobile phone users, if automated data collection from them becomes feasible, the resulting data volume and the subsequently trained mobile agents could reach unprecedented levels. Nevertheless, two major challenges arise: (1) extracting user instructions without human intervention and (2) utilizing distributed user data while preserving privacy.To tackle these challenges, we propose MobileA3gent, a collaborative framework that trains mobile GUI Agents using decentralized self-sourced data from diverse users. The framework comprises two components, each targeting a specific challenge: (1) Auto-Annotation, which enables the automatic collection of high-quality datasets during users’ routine phone usage with minimal cost. (2) FedVLM-A, which enhances federated VLM training under non-IID distributions by incorporating adapted global aggregation based on both episode-level and step-level variability. Extensive experiments prove that MobileA3gent achieves superior performance over traditional approaches at only 1% of the cost, highlighting its potential for real-world applications. Our code is publicly available at: https://anonymous.4open.science/r/MobileA3gent-Anonymous.

Anthology ID:: 2025.hcinlp-1.8
Volume:: Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP)
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Su Lin Blodgett, Amanda Cercas Curry, Sunipa Dev, Siyan Li, Michael Madaio, Jack Wang, Sherry Tongshuang Wu, Ziang Xiao, Diyi Yang
Venues:: HCINLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 79–112
Language:
URL:: https://aclanthology.org/2025.hcinlp-1.8/
DOI:
Bibkey:
Cite (ACL):: WenHao Wang, Mengying Yuan, Zijie Yu, Guangyi Liu, Rui Ye, Tian Jin, Siheng Chen, and Yanfeng Wang. 2025. MobileA3gent: Training Mobile GUI Agents Using Decentralized Self-Sourced Data from Diverse Users. In Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP), pages 79–112, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: MobileA3gent: Training Mobile GUI Agents Using Decentralized Self-Sourced Data from Diverse Users (Wang et al., HCINLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.hcinlp-1.8.pdf

PDF Cite Search Fix data