Pengxiang Zhao
Other people with similar names: Pengxiang Zhao
2026
A3: Android Agent Arena for Mobile GUI Agents with Essential-State Procedural Evaluation
Yuxiang Chai | Shunye Tang | Han Xiao | Weifeng Lin | Hanhao Li | Jiayu Zhang | Liang Liu | Pengxiang Zhao | Guangyi Liu | Guozhi Wang | Shuai Ren | Rongduo Han | Haining Zhang | Siyuan Huang | Hongsheng Li
Findings of the Association for Computational Linguistics: ACL 2026
Yuxiang Chai | Shunye Tang | Han Xiao | Weifeng Lin | Hanhao Li | Jiayu Zhang | Liang Liu | Pengxiang Zhao | Guangyi Liu | Guozhi Wang | Shuai Ren | Rongduo Han | Haining Zhang | Siyuan Huang | Hongsheng Li
Findings of the Association for Computational Linguistics: ACL 2026
The advancement of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has catalyzed the development of mobile graphic user interface (GUI) AI agents, which is designed to autonomously perform tasks on mobile devices. However, a significant gap persists in mobile GUI agent evaluation, where existing benchmarks predominantly rely on either static frame assessments such as AndroidControl or offline static apps such as AndroidWorld and thus fail to capture agent performance in dynamic, real-world online mobile apps. To address this gap, we present Android Agent Arena (A3), a novel "essential-state" based procedural evaluation system for mobile GUI agents. A3 introduces a benchmark of 100 tasks derived from 20 widely-used, dynamic online apps across 20 categories from the Google Play Store, ensuring evaluation comprehension. A3 also presents a novel "essential-state" based procedural evaluation method that leverages MLLMs as reward models to progressively verify task completion and process achievement. This evaluation approach address the limitations of traditional function based evaluation methods on online dynamic apps. Furthermore, A3 includes a toolkit to streamline Android device interaction, reset online environment and apps and facilitate data collection from both human and agent demonstrations. The complete A3 system, including the benchmark and tools, will be publicly released to provide a robust foundation for future research and development in mobile GUI agents.
LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark
Guangyi Liu | Pengxiang Zhao | Liang Liu | Zhiming Chen | Yuxiang Chai | Yaozhen Liang | WenHao Wang | Siheng Chen | Zhengxi Lu | Shuai Ren | Hao Wang | Shibo He | Yong Liu | Wenchao Meng
Findings of the Association for Computational Linguistics: ACL 2026
Guangyi Liu | Pengxiang Zhao | Liang Liu | Zhiming Chen | Yuxiang Chai | Yaozhen Liang | WenHao Wang | Siheng Chen | Zhengxi Lu | Shuai Ren | Hao Wang | Shibo He | Yong Liu | Wenchao Meng
Findings of the Association for Computational Linguistics: ACL 2026
Mobile GUI agents show promise in automating tasks but face significant generalization challenges in long-tail scenarios. While learning from few-shot demonstrations is an emerging solution, its progress is hindered by two critical gaps: the lack of a comprehensive benchmark for systematic evaluation on mobile devices, and the absence of a systematic framework designed to learn from demonstrations in this domain. To address these gaps, we introduce LearnGUI, the first comprehensive benchmark designed for studying demonstration-based learning in mobile agents, comprising 2,252 offline and 101 online tasks. We further develop LearnAct, a modular agent framework engineered to systematically extract, retrieve, and leverage knowledge from visual demonstrations. Extensive evaluations across six backbone models validate our approach: LearnAct achieves dramatic improvements for general-purpose models (e.g., Gemini-2.5-Pro: 38.5%→58.9%) and specialized models alike (e.g., UI-TARS-7B-SFT’s online success rate: 18.1%→32.8%), demonstrating consistent gains across model architectures. Our work provides a robust benchmark and a systematic framework, paving the way for more adaptable and practical mobile agents. Our code and data are publicly available at https://lgy0404.github.io/LearnAct/.
FedGUI: Benchmarking Federated GUI Agents across Heterogeneous Platforms, Devices, and Operating Systems
WenHao Wang | Haoting Shi | Mengying Yuan | Yiquan Lin | Panrong Tong | Hanzhang Zhou | Guangyi Liu | Pengxiang Zhao | Yue Wang | Siheng Chen
Findings of the Association for Computational Linguistics: ACL 2026
WenHao Wang | Haoting Shi | Mengying Yuan | Yiquan Lin | Panrong Tong | Hanzhang Zhou | Guangyi Liu | Pengxiang Zhao | Yue Wang | Siheng Chen
Findings of the Association for Computational Linguistics: ACL 2026
Training GUI agents with traditional centralized methods faces significant cost and scalability challenges. Federated learning (FL) offers a promising solution, yet its potential is hindered by the lack of benchmarks that capture real-world, cross-platform heterogeneity. To bridge this gap, we introduce FedGUI, the first comprehensive benchmark for developing and evaluating federated GUI agents across mobile, web, and desktop platforms. FedGUI provides a suite of six curated datasets to systematically study four crucial types of heterogeneity: cross-platform, cross-device, cross-OS, and cross-source. Extensive experiments reveal several key insights: First, we show that cross-platform collaboration improves performance, extending prior mobile-only federated learning to diverse GUI environments; Second, we demonstrate the presence of distinct heterogeneity dimensions and identify platform and OS as the most influential factors. FedGUI provides a vital foundation for the community to build more scalable and privacy-preserving GUI agents for real-world deployment. Our code and data are publicly available at https://github.com/wwh0411/FedGUI..
MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents
Pengxiang Zhao | Guangyi Liu | Yaozhen Liang | Weiqing He | Zhengxi Lu | WenHao Wang | Yuehao Huang | Yuxiang Chai | Zhaolu Kang | Yaxuan Guo | Hao Wang | Kexin Zhang | Liang Liu | Yong Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Pengxiang Zhao | Guangyi Liu | Yaozhen Liang | Weiqing He | Zhengxi Lu | WenHao Wang | Yuehao Huang | Yuxiang Chai | Zhaolu Kang | Yaxuan Guo | Hao Wang | Kexin Zhang | Liang Liu | Yong Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shortcuts such as APIs and deep-links have emerged as efficient complements to flexible GUI operations, fostering a promising hybrid paradigm for MLLM-based mobile automation. However, systematic evaluation of GUI–shortcut hybrid agents remains largely underexplored. To bridge this gap, we introduce **MAS-Bench**, a benchmark that pioneers the evaluation of GUI-shortcut hybrid agents with a specific focus on the mobile domain. Beyond merely using predefined shortcuts, MAS-Bench assesses an agent’s capability to *autonomously generate* shortcuts by discovering and creating reusable, low-cost workflows. It features 139 complex tasks across 11 real-world applications, a knowledge base of 88 predefined shortcuts (APIs, deep-links, RPA scripts), and 9 evaluation metrics. Experiments demonstrate that hybrid agents achieve up to 68.3% success rate and 39% greater execution efficiency than GUI-only counterparts. Furthermore, our evaluation framework effectively reveals the quality gap between predefined and agent-generated shortcuts, validating its capability to assess shortcut generation methods. MAS-Bench addresses the lack of systematic benchmarks for GUI-shortcut hybrid mobile agents, providing a foundational platform for future advancements in creating more efficient and robust intelligent agents.
Search
Fix author
Co-authors
- Guangyi Liu 4
- Yuxiang Chai 3
- Liang Liu (陆亮) 3
- Wenhao Wang 3
- Siheng Chen 2
- Yaozhen Liang 2
- Yong Liu 2
- Zhengxi Lu 2
- Shuai Ren 2
- Hao Wang 2
- Zhiming Chen 1
- Yaxuan Guo 1
- Rongduo Han 1
- Shibo He 1
- Weiqing He 1
- Siyuan Huang 1
- Yuehao Huang 1
- Zhaolu Kang 1
- Hanhao Li 1
- Hongsheng Li 1
- Weifeng Lin 1
- Yiquan Lin 1
- Wenchao Meng 1
- Haoting Shi 1
- Shunye Tang 1
- Panrong Tong 1
- Guozhi Wang 1
- Yue Wang 1
- Han Xiao 1
- Mengying Yuan 1
- Haining Zhang 1
- Jiayu Zhang 1
- Kexin Zhang 1
- Hanzhang Zhou 1