Li Zhang
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications
Other people with similar names: Li Zhang, Li Zhang, Li Zhang, Li Zhang (AWS), Li Zhang (Birmingham), Li Zhang (Google), Li Zhang (Google), Li Zhang (IBM-china), Li Zhang (Nankai), Li Zhang (Newcastle, UK), Li Zhang (Teesside University), Li Zhang (China Telecom Research Institute), Li Zhang (UC San Diego), Li Zhang (UK), Li Zhang (University of Pennsylvania), Li Zhang (Wuhan)
Unverified author pages with similar names: Li Zhang
2026
Does Chain-of-Thought Reasoning Help Mobile GUI Agents? An Empirical Study
Li Zhang | Longxi Gao | Mengwei Xu
Findings of the Association for Computational Linguistics: ACL 2026
Li Zhang | Longxi Gao | Mengwei Xu
Findings of the Association for Computational Linguistics: ACL 2026
Reasoning capabilities have significantly improved the performance of vision-language models (VLMs) in domains such as mathematical problem-solving, coding, and visual question-answering. However, their impact on real-world applications remains unclear. This paper presents a large-scale empirical study on the effectiveness of reasoning-enabled VLMs in mobile GUI agents. We evaluate six pairs of VLMs, including both commercial and open-source lightweight models, by comparing their base and reasoning-enhanced versions across static and interactive benchmarks. Our findings show that reasoning-enabled VLMs generally provide only marginal improvements over their non-reasoning counterparts and can even degrade performance in certain agent configurations. Notably, reasoning and non-reasoning VLMs fail on different sets of tasks, suggesting that reasoning does have an impact, but its benefits and drawbacks counterbalance each other. We attribute these inconsistencies to the limitations of benchmarks and VLMs. Based on the findings, we provide insights for further enhancing mobile GUI agents in terms of benchmarks, VLMs, and their adaptability in dynamically invoking reasoning VLMs.
2025
DroidCall: A Dataset for LLM-powered Android Intent Invocation
Weikai Xie | Li Zhang | Shihe Wang | Rongjie Yi | Mengwei Xu
Findings of the Association for Computational Linguistics: EMNLP 2025
Weikai Xie | Li Zhang | Shihe Wang | Rongjie Yi | Mengwei Xu
Findings of the Association for Computational Linguistics: EMNLP 2025
The growing capabilities of large language models in natural language understanding significantly strengthen existing agentic systems. To power performant on-device mobile agents for better data privacy, we introduce DroidCall, the first training and testing dataset for accurate Android Intent invocation. With a highly flexible and reusable data generation pipeline, we constructed 10k samples in DroidCall. Given a task instruction in natural language, small language models such as Qwen2.5-3B and Gemma2-2B fine-tuned with DroidCall can approach or even surpass the capabilities of GPT-4o for accurate Android intent invocation. We also provide an end-to-end Android app equipped with these fine-tuned models to demonstrate the Android intent invocation process. The code and dataset are available at https://github.com/UbiquitousLearning/DroidCall